
Share
Image-based RAG tools at Morphik transform document search by preserving visual context, offering a more accurate and reliable way to extract information from complex PDFs than traditional OCR methods.
If you've ever tried to extract information from a complex PDF-think invoices with nested tables, research papers with intricate figures, or technical manuals with annotated diagrams-you know the frustration. Despite our best efforts, traditional OCR and parsing pipelines often mangle these documents, losing crucial visual context. At Morphik, we’ve taken a different approach: using images of pages directly in our Retrieval-Augmented Generation (RAG) tools. This method preserves the visual integrity of documents, making search more accurate and reliable.
Parsing complex documents is notoriously difficult. Even for simple text-heavy pages, common OCR tools can jumble headings and values, leading to incorrect information when retrieved. For example, consider a straightforward page showing Palantir's financials (Fig 1):
Q1
Financials
US commercial continues to accelerate in Q1 2025 alongside AIP revolution
+71% Y/Y
+65% Y/Y
US Commercial Revenue
US Commercial Customer Count
+19% Q/Q
+13% Q/Q
US Commercial Revenue
US Commercial Customer Count
+127% Y/Y
$810M
US Commercial Remaining Deal Value
2x Y/Y
US Commercial Total Contract Value
+30% Q/Q
US Commercial Deals Closed of $1M or Greater
+183% Y/Y
US Commercial Remaining Deal Value
US Commercial Total Contract Value
2025 Palantir Technologies Inc.
We define a customer as an organization from which we have recognized revenue during the trailing twelve months period.
This is one of the simplest documents you might encounter. The values and headings are jumbled, making it hard to retrieve accurate information. Now imagine dealing with more complex or technical documents.
Using images of pages instead of parsed text preserves the visual context that matters most. This approach is particularly useful for documents with charts, diagrams, and tables. Here’s why:
When we started building RAG at Morphik, we initially followed the standard approach: assembling a document processing pipeline with OCR, layout detection, and parsing. However, this method is brittle and error-prone:

At Morphik, we’ve shifted to using images directly in our RAG tools. Here’s a high-level overview of our approach:
Using image-based RAG tools offers several benefits:
While using images is often the best approach for RAG, there are scenarios where you still need structured text:
In these cases, passing the original document to a Language Model (LM) for processing is
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
22 July 2025
88 articles
Related Articles
Related Articles
More Stories