
Share
BabyLM returns to EMNLP 2026 with a revamped format including a new multilingual track and updated datasets, challenging researchers to push the boundaries of sample-efficient pretraining.
BabyLM, the challenge focused on sample-efficient pretraining under human-scale data budgets, is back for its fourth year as both a shared task and a workshop at EMNLP 2026. This iteration introduces several updates to keep the competition fresh and relevant for researchers and practitioners.
All data is available on the Hugging Face BabyLM community:
The MultiLingual track introduces a new challenge with a focus on multilingual pretraining. Participants can train models on a mixture of English, Dutch, and Chinese data, totaling 100M tokens. The word counts are adjusted by each language's Byte Premium to ensure fair representation.

The evaluation pipeline will be distributed as an open-source repository building on the 2025 version. The MultiLingual track will be assessed using a combination of zero-shot and finetuning-based tasks across the three languages.
Submissions can be made via ACL Rolling Review (ARR) or directly through OpenReview. Official submission links will be posted on the BabyLM website once they are live.
If you have questions for the organizers or want to connect with other participants, consider joining the BabyLM Slack community.
Tags
Original Sources
↗ https://babylm.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
15 April 2024
133 articles
Related Articles
Related Articles
More Stories