Running and Fine-Tuning Google’s Gemma 3n Multimodal Model Locally with Unsloth Studio

Products & Applications

The Engineer

4 Jul 2025 · 2 min read

Unsloth Studio's new tools allow developers to run Google’s versatile Gemma 3n model locally, enabling fine-tuning on diverse data without cloud dependencies.

Google's Gemma 3n is a powerful multimodal model that can handle image, audio, video, and text inputs. It comes in two sizes-2B (E2B) and 4B (E4B)-and supports 140 languages for both text and multimodal tasks. The latest update from Unsloth Studio makes it possible to run and fine-tune Gemma 3n locally using their tools, which is a significant step forward for practitioners looking to leverage this model without relying on cloud infrastructure.

Key Features of Gemma 3n

Multimodal Capabilities: Handles images, audio, video, and text.
Language Support: Supports 140 languages.
Context Length: 32K tokens.
Audio Input: Up to 30 seconds.
OCR and ASR: Optical character recognition (OCR) and automatic speech recognition (ASR).
Speech Translation: Via prompts.

Running Gemma 3n Locally

Unsloth Studio has made it straightforward to run Gemma 3n locally. The model is available in several configurations, each optimized for different use cases:

Dynamic 2.0 GGUF (Text Only): Suitable for text-based tasks.
Dynamic 4-bit Instruct: Ideal for fine-tuning.
16-bit Instruct: General-purpose configuration.

You can download the pre-trained models from Hugging Face:

2B (E2B) Models:
4B (E4B) Models:

For a comprehensive list of all available Gemma 3n models, including base and other formats, check out the Unsloth collection on Hugging Face.

Fine-Tuning Gemma 3n

Fine-tuning is a crucial step to adapt the model to specific tasks or datasets. Unsloth Studio provides a free Colab notebook to get you started:

Colab Notebook: Gemma 3N (4B) - Conversational

The notebook includes detailed instructions and code snippets to fine-tune Gemma 3n. Here are the official recommended settings for inference:

Temperature: 1.0
Top-K: 64
Min-P: 0.0 (optional, but 0.01 works well; llama.cpp default is 0.1)
Top-P: 0.95
Repetition Penalty: 1.0

Fixes and Technical Analysis

Unsloth Studio has addressed several issues with GGUFs not working properly in Ollama. If you are using Ollama, it's recommended to redownload the models to ensure they function correctly.

For a deeper dive into the technical details and fixes, refer to the [Fixes + Technical Analysis](https://unsloth.ai/docs/models/tutorials/gemma-3-how-to-run-and-fine-t