
Share
As open-source OCR models surge, Qwen 2.5 VL emerges as the standout performer in recent benchmarks, outpacing both closed-source rivals and traditional OCR services.
For several months, we've been evaluating how well vision models handle Optical Character Recognition (OCR). Our initial benchmark focused on closed-source models like GPT, Gemini, and Claude, comparing them to traditional OCR providers such as AWS, Azure, and GCP.
However, this week has been particularly exciting for open-source Large Language Models (LLMs) with the release of several new models. Here's a quick rundown:
This new benchmark focuses specifically on these open-source vision models. The evaluation includes:
Note that DeepSeek-v3 and Llama 3.3 do not yet have native vision support, so they could not be included. Additionally, while there are many other open-source OCR providers (Paddle OCR, Docling, Unstructured, etc.), this evaluation focuses solely on open-source Vision-Language Models (VLMs). A follow-up benchmark will revisit traditional OCR models.
As with our main OCR Benchmark, the dataset and methodologies here are entirely open-source. You can review (and reproduce) the evaluation yourself via our benchmark repository on Github and view all raw data in our Hugging Face repository.
After evaluating 1,000 documents for JSON extraction accuracy, cost, and latency, here are the major takeaways among the open-source models:

The evaluation was conducted on a dataset of 1,000 documents, focusing on JSON extraction accuracy, cost, and latency. The models were tested on a variety of document types to ensure comprehensive coverage. Each model's performance was measured against a ground truth dataset to determine accuracy.
Qwen 2.5
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
9 April 2025
88 articles
Related Articles
Related Articles
More Stories