
Share
ADeLe offers a fresh perspective by analyzing the cognitive demands of tasks, enabling more accurate predictions and explanations of AI model performance beyond what traditional benchmarks can offer.
In a significant step forward for AI evaluation, researchers from Microsoft and collaborating institutions have developed a novel framework called ADeLe (Annotated-Demand-Levels) to predict and explain how AI models will perform on unfamiliar tasks. This approach goes beyond traditional benchmarks by assessing the cognitive and knowledge-based abilities required for tasks and comparing them against the model's capabilities.
Current benchmarks often struggle to provide deep insights into why a model performs well or poorly on specific tasks. ADeLe addresses this gap by introducing a detailed methodology that not only measures performance but also explains it through an ability profile. This profile links outcomes to specific strengths and limitations of the model, offering practitioners valuable insights for model selection and improvement.
18 Cognitive Scales: ADeLe uses 18 scales to rate tasks based on their cognitive and knowledge demands. These scales cover a wide range of abilities, including:
Detailed Rubric: The rating process is guided by a detailed rubric originally developed for human tasks. This rubric has been adapted and validated for use with AI models, ensuring consistency and reliability.
Task Rating Process: Each task is rated from 0 to 5 on each of the 18 scales, based on how much it draws on a given ability. For example:
The process involves two main steps:
Model Evaluation:
Task Analysis:

For practitioners, ADeLe offers several key benefits:
Consider a simple math problem and an advanced one:
Simple Problem: "What is 2 + 2?"
Advanced Problem: "Prove the Pythagorean theorem."
By rating tasks in this way, ADeLe can provide a clear and detailed understanding of what each task demands from an AI model.
The researchers behind ADeLe are continuing to refine the framework and explore its applications. They aim to expand the number of scales and improve the rubric to cover more diverse tasks and models. Additionally, they plan to integrate ADeLe into broader AI governance frameworks to enhance evaluation and testing practices.
ADeLe represents a significant advancement in AI model evaluation by providing both predictive and explanatory power. By linking performance outcomes to specific cognitive and knowledge abilities, it offers practitioners valuable insights for optimizing model selection and performance. As the framework continues to evolve, it has the potential to transform how we evaluate and deploy AI models in various domains.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
4 June 2025
88 articles
Related Articles
Related Articles
More Stories