AI Agents Show Exponential Decline in Success Rates with Task Duration

Models & Research

The Engineer

9 May 2025 · 3 min read

Jacobine's study uncovers a critical flaw in AI reliability: as task lengths grow, success rates plummet exponentially, challenging current assumptions about AI capabilities and durability.

In a recent study, Matrice Jacobine builds on the empirical work of Kwa et al. (2025) to explore the performance of AI agents on tasks of varying durations. The key finding is that the success rates of these agents decline exponentially as the task length increases, following a simple mathematical model based on a constant failure rate per minute.

What Changed Technically

Exponential Decline Model: Jacobine's research reveals that the success rate of AI agents decreases exponentially with the duration of tasks. This is modeled by assuming a constant probability of failure for each minute the task takes, leading to an exponential decay in overall success rates.
Half-Life Concept: Each AI agent can be characterized by its own "half-life," which represents the time it takes for the success rate to drop by half. This concept helps in estimating the success rate at different task lengths.

Why It Matters

For practitioners and researchers, this model provides a clear framework for understanding and predicting the performance of AI agents on longer tasks. Here are the key implications:

Task Duration Planning: Teams can use the half-life metric to better plan and allocate resources for tasks that require extended periods.
Failure Mechanisms Insight: The exponential decline suggests that failures often occur due to the increasing complexity of subtasks within a longer task. This insight can guide improvements in AI agent design and training.

Technical Details

Model Assumptions:
- Constant Failure Rate: The model assumes a constant probability of failure per unit time (e.g., per minute).
- Exponential Decay: The success rate ( S(t) ) for a task of duration ( t ) is given by ( S(t) = e^{-\lambda t} ), where ( \lambda ) is the failure rate constant.
Data Fit:
- Kwa et al. (2025) provided empirical data on AI agent performance across various tasks.
- Jacobine's model fits this data well, indicating that the exponential decay accurately captures the observed decline in success rates.

Implementation Notes

Practical Application: To apply this model, practitioners can:
- Determine the half-life ( t_{1/2} ) for a given AI agent using the formula ( t_{1/2} = \frac{\ln(2)}{\lambda} ).
- Use this half-life to estimate success rates for tasks of different durations.
Limitations:
- The model assumes that failure is independent and uniformly distributed over time, which may not hold in all scenarios.
- Further research is needed to validate the model across a wider range of tasks and AI agents.

Future Work

While Jacobine's findings are promising, several questions remain open:

Generalizability: Does this model apply consistently across different types of tasks and domains?
Improvement Strategies: How can we design AI agents to reduce the failure rate constant ( \lambda ) and extend their half-life?
Real-World Implications: What are the practical implications for industries that rely heavily on long-duration tasks, such as autonomous vehicles or complex manufacturing processes?

Conclusion

The discovery of an exponential decline in success rates for AI agents with task duration offers a valuable tool for practitioners. By understanding and leveraging the half-life concept, teams can better predict performance, allocate resources, and identify areas for improvement. Future research will be crucial in validating and extending these findings to broader applications.