
Share
Researchers from Google DeepMind and universities have developed Generative Value Learning, which uses vision-language models to predict task progress in robots, potentially revolutionizing how robotic manipulation is taught and executed.
A team of researchers from Google DeepMind, the University of Pennsylvania, and Stanford University has introduced Generative Value Learning (GVL), a novel approach that leverages vision-language models (VLMs) to predict task progress in robotic manipulation scenarios. This work, accepted at ICLR 2025, addresses the challenge of creating universal value function estimators that can generalize across diverse tasks and domains.
Traditionally, predicting temporal progress in robotics has required extensive labeled data and specialized models for each task. GVL breaks this mold by using VLMs to embed world knowledge into a single, versatile model. Here’s how it works:
For robotics researchers and engineers, GVL offers several practical benefits:

The architecture of GVL is built on top of existing VLMs like CLIP or Flamingo. Here are some key implementation details:
To demonstrate GVL's capabilities, the researchers provide an interactive demo where users can select from a variety of robotic manipulation tasks and visualize the model’s predictions frame-by-frame. This tool is particularly useful for understanding how GVL processes visual information and makes value estimations in real-time.
Generative Value Learning (GVL) represents a significant step forward in creating universal value function estimators for robotics. By leveraging the rich world knowledge embedded in VLMs, GVL can predict task progress with minimal data and strong generalization capabilities. This approach has the potential to streamline the development of intelligent robots that can learn, adapt, and improve across diverse tasks.
Tags
Original Sources
↗ https://generative-value-learning.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
8 November 2024
133 articles
Related Articles
Related Articles
More Stories