Hugging Face, an AI and machine learning platform, has acquired XetHub, a Seattle-based company focused on scaling Git for large datasets and AI models. The acquisition aims to enhance Hugging Face’s capabilities in managing and versioning large datasets and models, a critical need as the AI community scales to even larger models and datasets.
“This is the real 🍓—welcome to @xetdata. We’re just getting started!” posted Hugging Face Chief Clement Delangue on X, in reference to OpenAI’s project Strawberry.
“Big models are here to stay,” said Delangue. “What we want is to make the development of AI closer to what software engineering is — make it drastically faster,” he added.
Founded in 2021 by Yucheng Low, Ajit Banerjee, and Rajat Arya, XetHub has developed technology that enables Git to handle terabyte-scale repositories, allowing teams to work efficiently with evolving datasets and models. This acquisition aligns with Hugging Face’s long-term goal of optimising storage and versioning for AI development, moving away from the limitations of Git LFS, which was not designed to handle the immense file sizes typical in AI.
“The XetHub team will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub’s repos,” said Hugging Face CTO Julien Chaumond.
He further added that XetHub’s technology would unlock significant growth for the platform by enabling more efficient data management. For example, instead of re-uploading entire files, users will only need to upload modified chunks, streamlining updates and reducing storage needs. This improvement is crucial as AI models continue to grow in size, with trillion-parameter models like the BigLlama-3.1-1T already on the horizon.
XetHub, which started in 2021 with support from Madrona and other angel investors, was built by a team experienced in scaling AI infrastructure, including work on Apple’s internal machine learning infrastructure.
The team will now integrate XetHub’s technology into the Hugging Face platform, aiming to make AI collaboration and development easier for its vast community of users.
In the announcement, Yucheng Low, co-founder of XetHub, highlighted the importance of data in AI’s evolution and expressed excitement about joining Hugging Face to continue their mission of enhancing AI collaboration at scale.
Hugging Face is currently handling a significant volume of data, with over 1.3 million model repositories, 450,000 datasets, and 680,000 spaces, totaling 12 petabytes of data stored in LFS. The acquisition of XetHub is expected to help manage this growing demand more efficiently.
Hugging Face’s infrastructure team is also expanding and actively hiring to support the ongoing development of its platform.
Previously Hugging Face acquired Spanish-based startup Agrilla for $10 million. Argilla specialises in collaborative software for AI professionals, focusing on data annotation and enhancing NLP with human-machine collaboration. This acquisition helps Hugging Face improve its data annotation capabilities and integrate human feedback into AI model training.
Hugging recently announced profitability as well. Founded in 2016, Hugging Face secured $235 million at a $4.5 billion valuation in a Series D funding round last year from major players including Google, Amazon, NVIDIA, Salesforce, AMD, Intel, IBM, and Qualcomm.