
Share
SmolDocling stands out in the document conversion landscape with its compact size and comprehensive approach, handling both content and layout without the need for large models or complex setups.
The field of document conversion has seen significant advancements with the rise of vision-language models (VLMs). However, most existing solutions either rely on large foundational models or complex ensembles of specialized models. Enter SmolDocling, a compact 256M parameter VLM introduced by Nassar et al., which offers an end-to-end solution for converting documents into a structured format. This model not only captures the content and structure of document elements but also their spatial location, making it a robust choice for a wide range of document types.
The authors have contributed novel publicly sourced datasets for charts, tables, equations, and code recognition. These datasets are designed to cover a wide range of document types, including business documents, academic papers, technical reports, patents, and forms.

SmolDocling represents a significant step forward in the field of document conversion by offering an ultra-compact, end-to-end solution that captures both content and spatial context. Its robust performance across diverse document types and efficient computational requirements make it a compelling choice for practitioners looking to streamline their document processing workflows. The availability of the model and the upcoming release of the datasets will further facilitate research and development in this area.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
19 March 2025
88 articles
Related Articles
Related Articles
More Stories