
Share
Hugging Face's purchase of Seattle-based XetHub aims to streamline AI development with advanced Git tools, enabling efficient handling of massive datasets-a move set to transform how developers manage machine learning projects.
Hugging Face has announced the acquisition of XetHub, a Seattle-based company founded by Yucheng Low, Ajit Banerjee, and Rajat Arya. These founders previously worked at Apple, where they built and scaled the internal ML infrastructure. XetHub's mission is to bring software engineering best practices to AI development, and their technology allows Git to handle terabyte-scale repositories efficiently.
The acquisition of XetHub by Hugging Face (HF) marks a significant step in enhancing the management of large datasets and models. Julien Chaumond, HF CTO, emphasized that this move will "unlock the next 5 years of growth" for HF datasets and models by transitioning to a more optimized storage backend.
When HF built the first version of the Hub in 2020, they chose Git LFS (Large File Storage) as the foundation. Git LFS was a reasonable choice at the time due to its familiarity and adequacy for bootstrapping the Hub's usage. However, it has limitations when dealing with the massive files common in AI development.
XetHub’s technology addresses these limitations by enabling:

Imagine you have a 10GB Parquet file and need to add a single row. With current methods, you would need to re-upload the entire 10GB file. XetHub’s chunked files and deduplication will allow you to upload only the few chunks containing the new data, significantly reducing storage and bandwidth usage.
For example, let's say a user like @bartowski wants to update a single metadata value in the GGUF header of a Llama 3.1 405B model repository. With XetHub’s technology, they can re-upload only the specific chunk containing the new data, making the process much more efficient.
XetHub brings a talented team of 12 members to Hugging Face. Yucheng Low and his co-founders have already made significant contributions to the AI community, and their expertise will be crucial in driving HF’s next phase of growth. You can follow the new XetHub team at hf.co/xet-team.
As the field moves toward trillion-parameter models, the need for efficient storage and versioning solutions becomes even more critical. Hugging Face’s acquisition of XetHub is a strategic move that will not only enhance their existing offerings but also set new standards in AI repository management.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
9 August 2024
133 articles
Related Articles

Smarter Engagement for Stronger Growth: How Payers Can Leverage AI to Do More with Less
Products & Applications · 3 min

Penn Medicine and K Health Deploy AI Clinical Agents to Enhance Patient Care
Products & Applications · 3 min

Wheel and b.well Partner to Build Turnkey AI-First Virtual Care Infrastructure
Products & Applications · 3 min
Related Articles

Smarter Engagement for Stronger Growth: How Payers Can Leverage AI to Do More with Less
Products & Applications · 3 min

Penn Medicine and K Health Deploy AI Clinical Agents to Enhance Patient Care
Products & Applications · 3 min

Wheel and b.well Partner to Build Turnkey AI-First Virtual Care Infrastructure
Products & Applications · 3 min
More Stories