Transforming LLMs into Efficient Computational Engines

Models & Research

The Engineer

16 Mar 2026 · 3 min read

Scientists at Percepta AI have developed a method to enhance language models, turning them into efficient computational tools capable of handling complex math problems with ease and precision.

Can LLMs Be True Computers?

Language models have shown remarkable capabilities in solving complex mathematical problems, often reaching research-grade solutions. However, they falter on simpler computational tasks that require multi-step reasoning and long context handling. Even basic operations like multiplying two numbers or solving small Sudoku puzzles are challenging without external tools.

But what if an LLM could execute these tasks as reliably and efficiently as a traditional computer? A recent breakthrough from Percepta AI demonstrates how to turn a transformer model into a computational engine capable of executing arbitrary C code, achieving millions of steps in seconds.

How It Works: Building a Computer Inside a Transformer

The key innovation lies in converting C code into tokens that the language model can process directly. This allows the model to execute programs step-by-step, generating an execution trace and streaming results at high speeds. Here’s how it works when solving a min-cost perfect matching problem using the Hungarian algorithm:

Example: Min-Cost Perfect Matching

Input (10×10 Cost Matrix):

61 58 35 86 32 39 41 27 21 42
59 77 97 99 78 21 89 72 35 63
88 85 37 57 59 97 37 29 69 94
32 82 53 20 77 96 21 70 50 61
15 44 81 10 64 36 56 78 20 69
76 35 87 69 16 55 26 37 30 66
86 32 74 94 32 14 24 12 31 70
97 63 20 64 90 21 28 49 89 10
58 52 27 76 61 35 17 91 37 66
42 79 61 26 55 98 70 17 26 86

Output:

Tokens per Second (tok/s): 33,868
Total Tokens: 31,676
Lines per Second: 7,363

The model executes the program directly using its transformer weights, producing a readable log and token trace. It streams results at more than 30k tokens/sec on a CPU, demonstrating impressive performance.

Technical Details: Efficient Execution Traces

The core technical idea is a new decoding path that optimizes attention lookups from linear scans to logarithmic time queries. This enables the model to perform millions of correct execution steps within a single run. Here are the key points:

Tokenization: C code is tokenized into a format the model can process.
Execution Trace: The model generates an execution trace, step by step.
Attention Optimization: Attention lookups are optimized for logarithmic time complexity, allowing efficient long-step computations.

Motivation: Bridging the Gap in Computational Capabilities

While state-of-the-art language models excel at complex mathematical tasks, they struggle with basic computational tasks. This gap is evident in benchmarks like Sudoku-Bench, which show low unaided solve rates.

To bridge this gap, practitioners often use two approaches:

Tool Use: The model writes code, an external interpreter executes it, and the model reports the results. This method significantly improves math reasoning.
Agentic Orchestration: An outer loop stores intermediate state, decomposes tasks, and repeatedly calls the model on short contexts, effectively creating a state machine.

These workarounds are effective but highlight a fundamental limitation: LLMs do not reliably perform long, exact computations on their own. The ability to execute programs directly within the model itself represents a significant step forward in addressing this limitation.