UHG
Search
Close this search box.

Hugging Face Acquires XetHub to Build and Scale Millions of Large LLMs

​​“Big models are here to stay,” says Clem Delangue.

Share

Hugging Face, an AI and machine learning platform, has acquired XetHub, a Seattle-based company focused on scaling Git for large datasets and AI models. The acquisition aims to enhance Hugging Face’s capabilities in managing and versioning large datasets and models, a critical need as the AI community scales to even larger models and datasets.

“This is the real 🍓—welcome to @xetdata. We’re just getting started!” posted Hugging Face Chief Clement Delangue on X, in reference to OpenAI’s project Strawberry.

​​“Big models are here to stay,” said Delangue. “What we want is to make the development of AI closer to what software engineering is — make it drastically faster,” he added.

https://twitter.com/julien_c/status/1821540661973160339

Founded in 2021 by Yucheng Low, Ajit Banerjee, and Rajat Arya, XetHub has developed technology that enables Git to handle terabyte-scale repositories, allowing teams to work efficiently with evolving datasets and models. This acquisition aligns with Hugging Face’s long-term goal of optimising storage and versioning for AI development, moving away from the limitations of Git LFS, which was not designed to handle the immense file sizes typical in AI.

“The XetHub team will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub’s repos,” said Hugging Face CTO Julien Chaumond.

He further added that XetHub’s technology would unlock significant growth for the platform by enabling more efficient data management. For example, instead of re-uploading entire files, users will only need to upload modified chunks, streamlining updates and reducing storage needs. This improvement is crucial as AI models continue to grow in size, with trillion-parameter models like the BigLlama-3.1-1T already on the horizon.

XetHub, which started in 2021 with support from Madrona and other angel investors, was built by a team experienced in scaling AI infrastructure, including work on Apple’s internal machine learning infrastructure. 

The team will now integrate XetHub’s technology into the Hugging Face platform, aiming to make AI collaboration and development easier for its vast community of users.

In the announcement, Yucheng Low, co-founder of XetHub, highlighted the importance of data in AI’s evolution and expressed excitement about joining Hugging Face to continue their mission of enhancing AI collaboration at scale.

Hugging Face is currently handling a significant volume of data, with over 1.3 million model repositories, 450,000 datasets, and 680,000 spaces, totaling 12 petabytes of data stored in LFS. The acquisition of XetHub is expected to help manage this growing demand more efficiently.

Hugging Face’s infrastructure team is also expanding and actively hiring to support the ongoing development of its platform.

Previously Hugging Face acquired Spanish-based startup Agrilla for $10 million. Argilla specialises in collaborative software for AI professionals, focusing on data annotation and enhancing NLP with human-machine collaboration. This acquisition helps Hugging Face improve its data annotation capabilities and integrate human feedback into AI model training.

Hugging recently announced profitability as well. Founded in 2016, Hugging Face secured $235 million at a $4.5 billion valuation in a Series D funding round last year from major players including Google, Amazon, NVIDIA, Salesforce, AMD, Intel, IBM, and Qualcomm.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.