
Share
By partnering with DigitalOcean and leveraging AMD GPUs, Character.ai doubled its inference performance, crucial for serving 20 million users who demand seamless, low-latency interactions.
In a significant technical collaboration, Character.ai, DigitalOcean, and AMD have optimized GPU workloads to achieve a 2x production inference throughput for the AI entertainment platform. This optimization is crucial for Character.ai, which serves around 20 million users worldwide and requires low-latency performance at scale.
Character.ai's application demands high-performance GPUs to handle large-scale, low-latency inference tasks. To meet these requirements, they partnered with DigitalOcean and AMD to optimize the Qwen3-235B Instruct FP8 model on a cluster of AMD Instinct™ MI325X GPUs.
The teams focused on several key areas to achieve this performance boost:
Character.ai leverages multiple models, including Qwen, Mistral, and others. This deep dive focuses on the optimization of the Qwen3-235B Instruct FP8 model on a cluster of DigitalOcean droplets featuring AMD GPUs.

The primary objective was to run the Qwen3-235B model with a workload of 5600 / 140 (ISL / OSL) on AMD Instinct™ MI325X GPUs. The goal was to maximize request throughput (QPS) per MI325X 8x GPU server while maintaining strict latency and concurrency constraints.
The optimizations resulted in:
These performance gains have not only improved the user experience on Character.ai but also resulted in significant cost savings. The collaboration between Character.ai, DigitalOcean, and AMD has led to a multi-year, eight-figure annual agreement for GPU infrastructure, reflecting the success of this technical partnership.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
15 January 2026
88 articles
Related Articles
Related Articles
More Stories