ZAYA1-8B: A High-Efficiency Open Reasoning Model Trained on AMD Instinct MI300 GPUs

Models & Research

The Engineer

7 May 2026 · 3 min read

Zyphra's ZAYA1-8B challenges industry giants by offering a compact yet powerful AI model, demonstrating superior performance on AMD’s Instinct MI300 GPUs and paving the way for more efficient AI development.

Even as major players like OpenAI and Anthropic continue to push the boundaries of AI with ever-larger models, a growing number of researchers are focusing on creating smaller, more efficient alternatives. One such effort comes from Palo Alto-based startup Zyphra, which recently released ZAYA1-8B, an 8-billion-parameter mixture-of-experts (MoE) language model that boasts competitive performance despite its relatively compact size.

ZAYA1-8B is notable not just for its efficiency but also for the platform it was trained on: AMD's Instinct MI300 GPUs. This marks a significant milestone, demonstrating that AMD's hardware can compete with Nvidia's dominance in AI training. The model is available under an Apache 2.0 license and can be downloaded from Hugging Face or tested on Zyphra Cloud.

What Makes ZAYA1-8B Stand Out

Zayra's approach to building ZAYA1-8B centers around what they call "full-stack innovation," encompassing advancements in architecture, pretraining, and reinforcement learning (RL). Here are the key technical details:

MoE++ Architecture

ZAYA1-8B is built on Zyphra’s proprietary MoE++ architecture, which introduces several improvements over standard Transformer models:

Compressed Convolutional Attention (CCA): This mechanism addresses the memory limitations of traditional attention layers by performing sequence mixing in a compressed latent space. As a result, CCA reduces the KV-cache size by 8x compared to full multi-head attention, making it more efficient for long-context reasoning.
The ZAYA1 MLP Router: Unlike conventional MoE models that use linear routers, ZAYA1-8B employs a more sophisticated router. This allows for better distribution of tokens across "experts," leading to improved performance and efficiency.

Training on AMD Instinct MI300 GPUs

ZAYA1-8B was trained entirely on AMD's Instinct MI300 GPUs, which have been gaining traction as an alternative to Nvidia’s hardware. This training setup is significant for several reasons:

Scalability: The Instinct MI300 series offers high computational power and memory bandwidth, making it suitable for large-scale model training.
Cost Efficiency: AMD's GPUs are often more cost-effective than their Nvidia counterparts, potentially lowering the barrier to entry for smaller labs and independent developers.

Performance Benchmarks

Despite its smaller size, ZAYA1-8B holds its own against larger models like GPT-5-High and DeepSeek-V3.2 on various third-party benchmarks. This performance is a testament to the effectiveness of Zyphra’s full-stack innovation approach.

Key Takeaways

ZAYA1-8B represents a promising step forward in the development of efficient, high-performance language models. Here are the key takeaways for practitioners and researchers:

Efficiency Matters: Smaller, more efficient models can offer competitive performance while reducing computational costs and environmental impact.
AMD’s Viability: The success of ZAYA1-8B on AMD Instinct MI300 GPUs demonstrates that this platform is a viable alternative to Nvidia for AI training, potentially diversifying the hardware landscape.
Open Source Availability: By releasing ZAYA1-8B under an Apache 2.0 license, Zyphra empowers the community to experiment and build upon their work, fostering collaboration and innovation.

Zayra’s ZAYA1-8B is a compelling example of how innovative architecture and efficient training can lead to powerful AI models without the need for massive computational resources. As the field continues to evolve, we can expect more such efforts that balance performance with practicality.