
Share
Cloudflare unveils Omni, a groundbreaking platform that runs multiple AI models on fewer GPUs, boosting efficiency and conserving resources in the competitive tech landscape.
2025-08-27
By Sven Sauleau and Mari Galicer
As the demand for AI products continues to grow, developers are creating a wider variety of models. At Cloudflare, we've been adding new models to our growing catalog on Workers AI, but noticed that not all models are used equally-leaving infrequently used ones occupying valuable GPU space. Given that efficiency is a core value at Cloudflare and GPUs are a scarce resource, we built Omni, an internal platform designed to maximize GPU usage by running and managing multiple models on a single machine.
Omni introduces several key innovations:
At a high level, Omni is a platform for running AI models. Here’s how it processes inference requests:

Traditionally, each AI model would run in its own container or VM with a dedicated GPU. This setup is straightforward but inefficient at scale due to the overhead of managing multiple stacks. Omni addresses this by:
Omni represents a significant step forward in how Cloudflare manages AI models on our edge nodes. By maximizing GPU usage and improving model availability, we can provide better services to developers and users alike. As AI continues to evolve, platforms like Omni will be crucial in ensuring that resources are used efficiently and effectively.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
28 August 2025
88 articles
Related Articles
Related Articles
More Stories