Poetiq Leverages GPT-5.2 X-High to Achieve 75% Accuracy on PUBLIC-EVAL at Under $8 per Problem

Models & Research

The Engineer

24 Dec 2025 · 2 min read

Poetiq slashes costs while boosting accuracy to 75% on PUBLIC-EVAL using GPT-5.2 X-High, marking a significant leap over existing AI models and setting a new standard for efficiency and performance.

Poetiq, a leading AI research and development company, has announced significant improvements in accuracy and cost efficiency using the latest iteration of OpenAI's language model, GPT-5.2 X-High, on the PUBLIC-EVAL dataset. This marks a notable leap forward from previous state-of-the-art (SOTA) results.

Technical Changes and Impact

Poetiq ran their system with GPT-5.2 X-High on ARC-AGI-2, achieving an impressive 75% accuracy at a cost of under $8 per problem. This performance beats the previous SOTA by approximately 15 percentage points, showcasing a remarkable improvement in both accuracy and price efficiency.

Accuracy: Achieved 75% accuracy on PUBLIC-EVAL.
Cost Efficiency: Under $8 per problem.
Improvement Over Previous Models: ~15 percentage points higher than the previous SOTA.

Implementation Details

The Poetiq team utilized their existing harness without any specific training or optimization for GPT-5.2. This approach highlights the robustness and adaptability of both the Poetiq framework and the GPT-5.2 model.

Harness: No changes to the Poetiq harness.
Model: GPT-5.2 X-High, provided by OpenAI.
Dataset: Full PUBLIC-EVAL dataset.

Potential Implications

If similar performance trends hold on the SEMI-PRIVATE dataset used in ARC Prize’s official testing, GPT-5.2 X-High with Poetiq's framework could lead to a significant breakthrough. The team is optimistic about these potential outcomes and plans to release updated code supporting GPT-5.2 after the holidays.

Background

Poetiq has been at the forefront of AI reasoning research, previously establishing new SOTA results on the ARC-AGI-1 & 2 benchmarks. Their system description provides detailed insights into their methodology and achievements:

ARC-AGI Benchmarks: Poetiq's previous work set new standards in performance and efficiency.
System Description: Available at Poetiq’s website.

Conclusion

The integration of GPT-5.2 X-High into the Poetiq framework demonstrates a significant step forward in AI reasoning capabilities. The combination of high accuracy and cost efficiency makes this configuration highly attractive for both research and practical applications.

Stay tuned for more updates from Poetiq, including the release of their updated code to support GPT-5.2.