
Share
Discover how machine learning compilation tricks enable running powerful GPU-accelerated large language models on a budget Orange Pi 5, transforming affordable hardware into a potent AI tool.
Apr 20, 2024
The world of large language models (LLMs) has been rapidly evolving, but the high computational demands often require expensive hardware. However, recent advancements in machine learning compilation (MLC) have made it possible to run LLMs on affordable embedded devices. In this article, we explore how to achieve GPU-accelerated LLM performance on a $100 Orange Pi 5 with a Mali-G610 GPU. Specifically, we’ll see how MLC techniques can deliver impressive results for models like Llama3-8b, Llama2-7b, and RedPajama-3b.
The key technical advancement here is the successful deployment of MLC on a Mali GPU. This is significant because:
MLC leverages Apache TVM Unity, a generalizable stack for compiling and optimizing machine learning models across different hardware backends. Here’s a breakdown of the process:
If you want to try this out on your own Orange Pi 5, follow these steps:
Setup the Board:
Clone MLC-LLM Repository:

git clone https://github.com/mlc-ai/mlc-llm.git
cd mlc-llm
Llama-3-8B-Instruct-q4f16_1-MLC. You can also use Llama-2-7b-chat-hf-q4f16_1 or Llama-2-13b-chat-hf-q4f16_1 (requires a 16GB board).python scripts/download_weights.py Llama-3-8B-Instruct-q4f16_1-MLC
pip install -r requirements.txt
python scripts/run_model.py --model Llama-3-8B-Instruct-q4f16_1-MLC
Here are some benchmarks for different models on the Orange Pi 5:
The ability to run GPU
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
16 November 2023
88 articles
Related Articles
Related Articles
More Stories