
Share
Poetiq slashes costs while boosting accuracy to 75% on PUBLIC-EVAL using GPT-5.2 X-High, marking a significant leap over existing AI models and setting a new standard for efficiency and performance.
Poetiq, a leading AI research and development company, has announced significant improvements in accuracy and cost efficiency using the latest iteration of OpenAI's language model, GPT-5.2 X-High, on the PUBLIC-EVAL dataset. This marks a notable leap forward from previous state-of-the-art (SOTA) results.
Poetiq ran their system with GPT-5.2 X-High on ARC-AGI-2, achieving an impressive 75% accuracy at a cost of under $8 per problem. This performance beats the previous SOTA by approximately 15 percentage points, showcasing a remarkable improvement in both accuracy and price efficiency.
The Poetiq team utilized their existing harness without any specific training or optimization for GPT-5.2. This approach highlights the robustness and adaptability of both the Poetiq framework and the GPT-5.2 model.

If similar performance trends hold on the SEMI-PRIVATE dataset used in ARC Prize’s official testing, GPT-5.2 X-High with Poetiq's framework could lead to a significant breakthrough. The team is optimistic about these potential outcomes and plans to release updated code supporting GPT-5.2 after the holidays.
Poetiq has been at the forefront of AI reasoning research, previously establishing new SOTA results on the ARC-AGI-1 & 2 benchmarks. Their system description provides detailed insights into their methodology and achievements:
The integration of GPT-5.2 X-High into the Poetiq framework demonstrates a significant step forward in AI reasoning capabilities. The combination of high accuracy and cost efficiency makes this configuration highly attractive for both research and practical applications.
Stay tuned for more updates from Poetiq, including the release of their updated code to support GPT-5.2.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
24 December 2025
88 articles
Related Articles
Related Articles
More Stories