turbopuffer FTS v2: Up to 20x Faster with Vectorized MAXSCORE for Long LLM Queries

Tools & Engineering

The Engineer

11 Dec 2025 · 3 min read

Turbopuffer's FTS v2 accelerates long language model queries up to 20 times faster with vectorized MAXSCORE, offering a significant edge in processing large datasets and complex searches efficiently.

turbopuffer's latest version of their in-house text search engine, FTS v2, has seen a significant performance boost, especially when handling long queries generated by language models (LLMs). This improvement is thanks to two key enhancements: an optimized storage layout and a more efficient search algorithm. In this article, we'll focus on the new search algorithm, which leverages vectorized MAXSCORE over block-max WAND.

Why It Matters

For those of us working with large datasets and complex queries, performance can make or break an application. turbopuffer's FTS v2 is designed to handle longer, more intricate queries, which are often generated by automated agents or LLMs. These queries can contain dozens of terms, including stopwords, making them particularly challenging for traditional search algorithms.

Key Improvements

Vectorized MAXSCORE: This new approach allows the engine to process long queries much faster than previous methods.
Optimized Storage Layout: A more efficient way of storing and accessing data contributes to overall performance gains.

How It Works

1. Vectorized MAXSCORE

The core innovation in FTS v2 is the use of vectorized MAXSCORE, which is particularly effective for long queries. Here’s how it works:

Vectorization: By processing multiple query terms simultaneously using SIMD (Single Instruction Multiple Data) instructions, the engine can handle large query sizes more efficiently.
MAXSCORE Calculation: Instead of sequentially evaluating each term, the algorithm calculates the maximum score for a set of terms in parallel, reducing the overall computational overhead.

2. Comparison with Block-Max WAND

Block-max WAND is a well-known algorithm for lexical search, but it can struggle with long queries. Here’s why FTS v2 outperforms it:

Efficiency on Long Queries: Block-max WAND's performance degrades as the number of terms increases, while vectorized MAXSCORE maintains high efficiency.
Stopword Handling: FTS v2 is optimized to handle stopwords effectively, which are common in long LLM-generated queries.

Benchmarks

To demonstrate the performance gains, we ran benchmarks on a 5M-document Wikipedia export dataset. Here are some representative results:

"san francisco":
- FTS v1: 8ms
- FTS v2: 3ms
"the who":
- FTS v1: 57ms
- FTS v2: 7ms
"united states constitution":
- FTS v1: 20ms
- FTS v2: 5ms
"lord of the rings":
- FTS v1: 75ms
- FTS v2: 6ms
"pop singer songwriter born 1989 won best country song time person of the year" (a long, complex query):
- FTS v1: 174ms
- FTS v2: 20ms

Implementation Details

For those interested in the technical nitty-gritty:

SIMD Instructions: The use of SIMD instructions allows for parallel processing of multiple terms, significantly reducing latency.
Data Layout Optimization: The storage layout has been optimized to minimize disk I/O and cache misses, which is crucial for handling large datasets efficiently.

Conclusion

turbopuffer's FTS v2 represents a significant step forward in text search technology, especially for applications involving long, complex queries. By leveraging vectorized MAXSCORE and an optimized storage layout, the new engine delivers up to 20x faster performance compared to its predecessor. This improvement is not just a theoretical gain; it translates into real-world benefits for users and developers alike.