
Share
As AI models grow in complexity, the need for a dynamic and real-time data layer is becoming crucial. Here’s how a new web data infrastructure can bridge the gap.
AI is booming, and with it comes an ever-expanding array of use cases. However, to truly capitalize on this technology, enterprises need more than just powerful models; they require vast amounts of high-quality, up-to-date data. The challenge lies in the fact that much of the relevant information is either blocked or unstructured, making it difficult for AI systems to leverage effectively.
To understand this issue, consider the web itself. It was not designed with automated discovery and retrieval in mind, which poses a significant barrier for modern AI applications. Overcoming this inherent design constraint requires a new layer of infrastructure that can navigate the vast and ever-evolving digital landscape, delivering real-time information and overcoming technical barriers.
The next frontier in AI may well depend on a robust web data infrastructure layer. This layer must be capable of handling hundreds of millions of existing web domains and billions of new URLs created each week. Its primary function is to provide real-time, context-rich information that can keep AI models grounded in current and verifiable data.
Or Lenchner, CEO of Bright Data, a leading web data collection platform, emphasizes this point: “The data suggests there's far more data out there. Think of the universe: It's out there, but you don't know what you don’t know.” This analogy underscores the vast potential of untapped web data and the necessity of an infrastructure to harness it.

Traditional model training relies on static snapshots of information collected at a particular point in time. While this approach was sufficient for early AI breakthroughs driven by scaling training data and model size, it is no longer adequate. To track dynamic factors such as competitor pricing, consumer sentiment, and market trends, companies need a constant feed of new information.
Speed is not just a matter of convenience; it’s essential. Today’s business environments are characterized by continuous changes in prices, inventory, markets, security threats, and customer behavior. Delayed data retrieval can significantly reduce the usefulness of even the most sophisticated models. Using live, high-quality web data can also mitigate AI hallucinations, as the model has a more relevant knowledge base to draw from.
In practice, this means that organizations must invest in infrastructure that can handle millions of simultaneous interactions across websites with varying access rules and formats. This infrastructure should be designed to ensure that data retrieval is fast, reliable, and context-aware. By doing so, they can unlock the full potential of AI and make more informed decisions based on current and verifiable information.
The emergence of the web data infrastructure layer marks a significant step forward in the evolution of AI. As this technology continues to develop, it will play an increasingly critical role in bridging the gap between the vast amounts of available data and the sophisticated models that can leverage it effectively.
Tags
Original Sources
The emergence of the web data infrastructure layer for AI
↗ https://www.technologyreview.com/2026/06/24/1139202/the-emergence-of-the-web-data-infrastructure-layer-for-ai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
29 June 2026
68 articles
Related Articles
Related Articles
More Stories
© 2026 Cedar & Bloom. All rights reserved.