Fine-Tuning Llama 3.1 405B with Axolotl on a Lambda 1-Click Cluster

Models & Research

The Engineer

13 Sept 2024 · 3 min read

Learn how to fine-tune Meta's powerful open-source Llama 3.1 405B model with Axolotl on a scalable Lambda 1-Click Cluster, enhancing its capabilities for advanced AI tasks.

Llama 3.1 405B, recently released by Meta, has quickly established itself as one of the premier open-source models capable of competing with proprietary frontier models. This is evident from its performance on various benchmarks and the LMSYS Chatbot Arena Leaderboard. In this tutorial, we'll walk through how to fine-tune the Llama 3.1 405B model using Axolotl on a multi-node Lambda 1-Click Cluster (1CC). You can also find the code and examples in the Axolotl Cookbook.

What We’ll Cover

Setting up Axolotl and its required environment.
Fine-tuning the Llama 3.1 405B base model using Maxime Labonne’s FineTome dataset: mlabonne/FineTome-100k.

Prerequisites

Before you start, ensure you have the following:

A Lambda Cloud account.
A working 1-Click Cluster. We recommend at least 64 H100 GPUs for full parameter fine-tuning of the 405B model.

Setting Up Axolotl

Once your 1-Click cluster is set up, we can begin installing and setting up Axolotl. Here’s a step-by-step guide:

Step 1: Reserve and Set Up Your 1-Click Cluster

Detailed instructions on how to reserve and set up a 1-Click cluster are available in the Lambda Labs documentation. Ensure you have at least 64 H100 GPUs for optimal performance.

Step 2: Configure SSH

Lambda provisions several head nodes to act as both a jump proxy and coordinator for each of the worker/GPU nodes. To set up inter-node passwordless SSH, follow these steps:

Generate an SSH Key Pair (if you haven’t already):

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Add Your Public Key to Lambda Cloud: Go to your Lambda Cloud dashboard and add your public key.

Set Up SSH Config: Add the following to your ~/.ssh/config file:

Host lambda-1cc
    HostName <your_cluster_head_node_ip>
    User ubuntu
    IdentityFile ~/.ssh/id_rsa
    ForwardAgent yes

Run Lambda’s SSH Setup Script: Go to the 1-Click cluster dashboard, click "SETUP" next to "SSH LOGIN," and follow the instructions provided.

Fine-Tuning Llama 3.1 405B

Now that your environment is set up, let's dive into fine-tuning the Llama 3.1 405B model using Axolotl.

Step 1: Clone the Axolotl Repository

git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl

Step 2: Install Dependencies

Axolotl requires several dependencies. You can install them using pip:

pip install -r requirements.txt

Step 3: Download the FineTome Dataset

Download and prepare the FineTome dataset from Hugging Face:

python scripts/download_dataset.py --dataset mlabonne/FineTome-100k --output_dir data/finetome

Step 4: Configure the Fine-Tuning Script

Create a configuration file for fine-tuning. Here’s an example config.yaml:

model:
  name: meta-llama/Meta-Llama-3.1-405B
  max_length: 2048

data:
  path: data/finetome
  batch_size: 8
  num_workers: