Microsoft Unveils Three New Phi 3.5 Models, Outperforming Google and OpenAI on Key Benchmarks

Models & Research

The Engineer

26 Aug 2024 · 3 min read

Microsoft's latest Phi 3.5 models outshine competitors with advanced capabilities in reasoning and multimodal analysis, offered freely on Hugging Face under a permissive license, sparking new possibilities for AI developers worldwide.

Microsoft has once again upped the ante in the AI landscape with the release of three new models in its evolving Phi series. These models-Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct-are designed to tackle a range of tasks from fast reasoning to advanced multimodal analysis. Each model is available on Hugging Face under a permissive Microsoft-branded MIT License, allowing for commercial usage and modification.

Key Highlights

Phi-3.5-mini-instruct: 3.82 billion parameters, optimized for compute-constrained environments.
Phi-3.5-MoE-instruct: 41.9 billion parameters, a Mixture of Experts model for powerful reasoning.
Phi-3.5-vision-instruct: 4.15 billion parameters, specialized for vision tasks.

Phi-3.5 Mini Instruct: Lightweight and Efficient

The Phi-3.5-mini-instruct is a lightweight model with 3.82 billion parameters, tailored for instruction adherence and supporting a 128k token context length. This makes it ideal for environments where memory or compute resources are limited, such as edge devices or low-power servers.

Key Features:
- Instruction Adherence: Enhanced ability to follow complex instructions.
- Long Context Support: Capable of handling long sequences up to 128k tokens.
- Versatility: Suitable for tasks like code generation, mathematical problem solving, and logic-based reasoning.

Despite its compact size, the Phi-3.5-mini-instruct demonstrates competitive performance in multilingual and multi-turn conversational tasks. It outperforms similarly-sized models like Llama-3.1-8B-instruct and Mistral-7B-instruct on the RepoQA benchmark, which measures long context code understanding.

Phi-3.5 MoE: Mixture of Experts for Advanced Reasoning

The Phi-3.5-MoE (Mixture of Experts) model is a significant addition to Microsoft's AI arsenal. With 41.9 billion parameters, it leverages the Mixture of Experts approach to achieve powerful reasoning capabilities.

Key Features:
- Mixture of Experts: Combines multiple smaller models to handle different aspects of tasks.
- Scalability: Can be scaled efficiently across multiple GPUs or TPUs.
- Advanced Reasoning: Superior performance in complex reasoning and problem-solving tasks.

This model is designed for scenarios where high computational power is available, making it suitable for research and large-scale applications. It shows near state-of-the-art performance on various benchmarks, often outperforming models from Google, Meta, and OpenAI.

Phi-3.5 Vision Instruct: Specialized for Visual Analysis

The Phi-3.5-vision-instruct is a 4.15 billion parameter model specifically designed for vision tasks such as image and video analysis.

Key Features:
- Vision-Specific Architecture: Optimized for handling visual data.
- Multimodal Integration: Capable of integrating text and visual inputs seamlessly.
- Versatility: Suitable for a wide range of vision tasks, from object detection to complex scene understanding.

This model is particularly useful for applications in computer vision, where it can handle both static images and dynamic video content with high accuracy and efficiency.

Performance Benchmarks

All three models have been rigorously tested on various benchmarks and have shown impressive results:

Phi-3.5-mini-instruct:
- Outperforms Llama-3.1-8B-instruct and Mistral-7B-instruct on RepoQA.
- Competitive performance in multilingual and multi-turn conversational tasks.
Phi-3.5-MoE:
- Near state-of-the-art performance on multiple benchmarks.
- Superior to Google's Gemini 1.5 Flash, Meta's Llama 3.1, and OpenAI's GPT-4o in some cases.
Phi-3.5-vision-instruct:
- High accuracy in object detection and scene understanding tasks.
- Efficient handling of both