Exploring Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Models & Research

The Engineer

2 Sept 2025 · 3 min read

This survey uncovers how leading AI researchers are pushing boundaries in LLM reasoning through innovative approaches like inference scaling and specialized training methods, moving beyond basic conversational capabilities.

In the rapidly evolving landscape of large language models (LLMs), reasoning has become a critical capability that sets advanced AI systems apart from conventional chatbots. A recent survey by researchers from Salesforce AI Research, Nanyang Technological University, I2R, A*STAR, Singapore, National University of Singapore, and The Hong Kong University of Science and Technology (Guangzhou) delves into the latest advancements in LLM reasoning. This article summarizes their findings, focusing on two key dimensions: regimes and architectures.

Regimes: Inference Time vs. Dedicated Training

The survey categorizes methods based on when reasoning is achieved-either at inference time or through dedicated training.

Inference-Time Reasoning: Techniques that enhance reasoning capabilities during the model's inference phase. This includes:
- Input Level: Constructing high-quality prompts to guide the LLM (e.g., chain-of-thought prompting, few-shot learning).
- Output Level: Refining multiple sampled candidates to improve reasoning quality (e.g., majority voting, reranking).
Dedicated Training Reasoning: Methods that involve training the model specifically for reasoning tasks.
- Input Level: Pre-training with specialized datasets or techniques (e.g., self-supervised learning, data augmentation).
- Output Level: Post-processing to refine and validate the model's outputs (e.g., reinforcement learning, supervised fine-tuning).

Architectures: Standalone LLMs vs. Agentic Systems

The architecture dimension differentiates between standalone LLMs and agentic compound systems that incorporate external tools or multi-agent collaborations.

Standalone LLMs: Single models that perform reasoning tasks independently.
- Input Level: Techniques like chain-of-thought prompting to guide the model.
- Output Level: Methods such as majority voting to enhance output quality.
Agentic Systems: Compound systems that integrate external tools or multiple agents.
- Generator-Evaluator Pattern: One agent generates hypotheses, and another evaluates them.
- LLM Debate: Multiple LLMs engage in a debate to reach a consensus.
- Recent Innovations: New designs like the generator-evaluator-verifier pattern, which adds an additional verification step.

Key Learning Algorithms

The survey covers a range of learning algorithms used to train reasoning models:

Supervised Fine-Tuning: Adjusting pre-trained LLMs using labeled data specific to reasoning tasks.
Reinforcement Learning (RL): Techniques like Proximal Policy Optimization (PPO) and Guided Reinforcement Policy Optimization (GRPO) to improve model performance through feedback.
Training Reasoners and Verifiers: Specialized training for models that can reason and verify their own outputs.

Emerging Trends

Shift from Inference Scaling to Learning to Reason: Models like DeepSeek-R1 are moving away from purely scaling inference to incorporate dedicated reasoning training.
Transition to Agentic Workflows: Systems like OpenAI's Deep Research and Manus Agent are integrating external tools and multi-agent collaborations to enhance reasoning capabilities.

Domain-Specific Reasoning

The survey also highlights the development of domain-specific reasoning systems, which are tailored to specific industries or tasks. These systems often require specialized data and training methods to achieve high accuracy.

Open Challenges

Despite significant progress, several challenges remain:

Evaluation: Developing robust metrics to assess reasoning quality.
Data Quality: Ensuring that training data is diverse and representative.

This survey provides a comprehensive overview of the current state and future directions in LLM reasoning. It serves as a valuable resource for researchers and practitioners looking to advance the capabilities of AI systems in logical inference, problem-solving, and decision-making.