
Share
Cerebras Systems and AWS are revolutionizing cloud AI with lightning-fast Cerebras CS-3 systems, offering unparalleled speed for large language models and paving the way for a new era of disaggregated computing architectures.
Cerebras Systems, the leader in high-speed AI inference, is teaming up with Amazon Web Services (AWS) to bring unprecedented speed and performance to cloud-based AI models. Starting today, AWS customers will have access to Cerebras CS-3 systems via AWS Bedrock, enabling them to run leading open-source large language models (LLMs) and Amazon’s Nova models at the industry's highest inference speeds.
AI is rapidly transforming software development, with AI agents increasingly taking over tasks that were traditionally done by human developers. This shift has a significant impact on the computational requirements for AI inference. Unlike conversational chat, agentic coding generates approximately 15 times more tokens per query and demands high-speed token output to keep developers productive. As a result, there is an urgent need for faster inference capabilities across the industry.
Cerebras has been at the forefront of this movement, powering models from OpenAI, Cognition, and Meta with speeds of up to 3,000 tokens per second. By bringing this technology to AWS, one of the world’s leading cloud providers, the collaboration aims to meet the growing demand for fast inference on a global scale.
To achieve even higher performance, AWS and Cerebras are collaborating on a novel disaggregated architecture that pairs AWS Trainium with Cerebras WSE (Wafer-Scale Engine). This approach leverages the strengths of both systems to deliver 5 times more high-speed token capacity in the same hardware footprint.

Trainium for Prefill:
Cerebras WSE for Decode:
Disaggregated Configuration:
Performance Benchmarks:
The collaboration between AWS and Cerebras represents a major step forward in AI inference technology. By combining the strengths of Trainium and WSE, this disaggregated architecture not only meets but exceeds the growing demand for fast and efficient AI processing. For developers and businesses relying on AI-driven applications, this means more productive workflows, faster development cycles, and ultimately, better end-user experiences.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
16 March 2026
133 articles
Related Articles

Smarter Engagement for Stronger Growth: How Payers Can Leverage AI to Do More with Less
Products & Applications · 3 min

Penn Medicine and K Health Deploy AI Clinical Agents to Enhance Patient Care
Products & Applications · 3 min

Wheel and b.well Partner to Build Turnkey AI-First Virtual Care Infrastructure
Products & Applications · 3 min
Related Articles

Smarter Engagement for Stronger Growth: How Payers Can Leverage AI to Do More with Less
Products & Applications · 3 min

Penn Medicine and K Health Deploy AI Clinical Agents to Enhance Patient Care
Products & Applications · 3 min

Wheel and b.well Partner to Build Turnkey AI-First Virtual Care Infrastructure
Products & Applications · 3 min
More Stories