
Share
This article explores how Huawei's Ascend backend was seamlessly integrated with Torchtune, enhancing AI training on NPU hardware and unlocking new possibilities for large language model optimization.
By Chenguang Li and Mengqing Cao (Huawei PyTorch Team)
In this article, we’ll dive into how the Ascend backend has been integrated with Torchtune, a PyTorch-native library designed for fine-tuning Large Language Models (LLMs). We'll explore the technical changes that enable this integration and why it matters for AI practitioners.
Torchtune is a powerful tool in the PyTorch ecosystem, aimed at simplifying the fine-tuning of LLMs. It adheres to PyTorch’s principles by offering composable and modular building blocks, as well as easily extensible training recipes. This makes it an ideal choice for developers who need flexibility and control over their model training processes.
These resources are invaluable for both beginners and advanced users, offering detailed examples and best practices to optimize model training pipelines.
Ascend is a series of AI computing products by Huawei, designed to provide a full-stack AI infrastructure. This includes processors, hardware, foundational software, AI frameworks, development tools, and industry-specific applications. The Ascend platform is known for its efficiency and scalability, making it suitable for a wide range of AI workloads.

Initially, device matching in PyTorch was handled using simple device strings (e.g., "cuda:0"). However, this approach lacked flexibility and adaptability to different environments. To address this, torchtune introduced an abstraction layer for devices, utilizing the _get_device_support() method to dynamically retrieve relevant devices based on the current environment.
_get_device_support() method dynamically detects available devices, ensuring that the system can seamlessly switch between different types of hardware without manual configuration.Device Registration:
Backend Integration:
Performance Benchmarks:
The integration of Ascend backend with Torchtune represents a significant step forward in the PyTorch ecosystem. By leveraging advanced AI computing hardware and flexible device management, developers can achieve better performance and scalability for their model training processes. This integration opens up new possibilities for AI practitioners, making it
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 January 2025
88 articles
Related Articles
Related Articles
More Stories