Simplify Document Search with Image-Based RAG Tools

Tools & Engineering

The Engineer

22 Jul 2025 · 4 min read

Image-based RAG tools at Morphik transform document search by preserving visual context, offering a more accurate and reliable way to extract information from complex PDFs than traditional OCR methods.

If you've ever tried to extract information from a complex PDF-think invoices with nested tables, research papers with intricate figures, or technical manuals with annotated diagrams-you know the frustration. Despite our best efforts, traditional OCR and parsing pipelines often mangle these documents, losing crucial visual context. At Morphik, we’ve taken a different approach: using images of pages directly in our Retrieval-Augmented Generation (RAG) tools. This method preserves the visual integrity of documents, making search more accurate and reliable.

The Pain of Traditional Parsing

Parsing complex documents is notoriously difficult. Even for simple text-heavy pages, common OCR tools can jumble headings and values, leading to incorrect information when retrieved. For example, consider a straightforward page showing Palantir's financials (Fig 1):

Q1
Financials
US commercial continues to accelerate in Q1 2025 alongside AIP revolution
+71% Y/Y
+65% Y/Y
US Commercial Revenue
US Commercial Customer Count
+19% Q/Q
+13% Q/Q
US Commercial Revenue
US Commercial Customer Count
+127% Y/Y
$810M
US Commercial Remaining Deal Value
2x Y/Y
US Commercial Total Contract Value
+30% Q/Q
US Commercial Deals Closed of $1M or Greater
+183% Y/Y
US Commercial Remaining Deal Value
US Commercial Total Contract Value
2025 Palantir Technologies Inc.
We define a customer as an organization from which we have recognized revenue during the trailing twelve months period.

This is one of the simplest documents you might encounter. The values and headings are jumbled, making it hard to retrieve accurate information. Now imagine dealing with more complex or technical documents.

Why Images Matter

Using images of pages instead of parsed text preserves the visual context that matters most. This approach is particularly useful for documents with charts, diagrams, and tables. Here’s why:

Visual Integrity: Images retain the exact layout and formatting of the original document, ensuring that information isn’t lost or misinterpreted.
Accuracy in Retrieval: When you search using images, the RAG tool can understand the context better, leading to more accurate results.
Simplicity: You avoid the complexity and potential errors introduced by OCR and parsing pipelines.

The Traditional Approach: A House of Cards

When we started building RAG at Morphik, we initially followed the standard approach: assembling a document processing pipeline with OCR, layout detection, and parsing. However, this method is brittle and error-prone:

OCR Limitations: OCR tools often struggle with complex layouts, leading to incorrect or missing information.
Layout Detection: Detecting and preserving the correct layout of elements like tables and figures is challenging.
Parsing Errors: Even minor errors in parsing can lead to significant issues when retrieving information.

How We Do It at Morphik

At Morphik, we’ve shifted to using images directly in our RAG tools. Here’s a high-level overview of our approach:

Image Ingestion: Documents are converted into image formats (e.g., PNG or JPEG) without any OCR or parsing.
Visual Search: Our RAG tool uses vision-language models to understand and search the content within these images.
Contextual Retrieval: When a user searches, the tool retrieves the relevant sections of the document, maintaining the original visual context.

Benefits for Practitioners

Using image-based RAG tools offers several benefits:

Improved Accuracy: By preserving the visual integrity of documents, you get more accurate search results.
Reduced Complexity: You avoid the overhead and potential errors introduced by traditional parsing pipelines.
Enhanced User Experience: Users can quickly find and understand information without dealing with jumbled or missing content.

When to Use OCR

While using images is often the best approach for RAG, there are scenarios where you still need structured text:

Structured Databases: For syncing information into databases or data lakes, converting documents to structured formats remains essential.
Data Analysis: If you need to perform detailed analysis on the extracted data, OCR can be useful despite its quirks.

In these cases, passing the original document to a Language Model (LM) for processing is