
Share
Salesforce's XGen-MM marks a pivotal shift in multimodal AI, building on its BLIP technology to deliver unprecedented performance and versatility, setting new benchmarks in the field.
Salesforce AI Research has announced the continuation and rebranding of their BLIP series into the XGen-MM (X-Generation Multimodal) series, aligning with the company's broader initiative for large foundation models. This new series represents a significant leap in multimodal technology, offering state-of-the-art performance across various benchmarks.
The XGen-MM series builds on the successful designs of the BLIP series, incorporating several key enhancements:
xgen-mm-phi3-mini-base-r-v1 model achieves state-of-the-art performance with fewer than 5 billion parameters. It demonstrates strong in-context learning capabilities, making it highly versatile for a wide range of tasks.xgen-mm-phi3-mini-instruct-r-v1 variant further improves performance through instruction tuning, achieving top scores among both open-source and closed-source Vision-Language Models (VLMs) under 5 billion parameters.The latest release, XGen-MM-v1.5, includes several variants:
xgen-mm-phi3-mini-instruct-interleave-r-v1.5xgen-mm-phi3-mini-base-r-v1.5xgen-mm-phi3-mini-instruct-singleimg-r-v1.5xgen-mm-phi3-mini-instruct-dpo-r-v1.5
The base model, xgen-mm-phi3-mini-base-r-v1, shows impressive performance across multiple benchmarks:
| Model | Shot | COCO (val) | NoCaps (val) | TextCaps (val) | OKVQA (val) | TextVQA (val) | VizWiz (testdev) | VQAv2 (testdev) | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | Flamingo-3B | 4 | 85.0 | - | - | 43.3 | 32.7 | 34 | 53.2 | | | 8 | 90.6 | - | - | 44.6 | 32.4 | 38.4 | 55.4 | | MM1-3B | 0 | 73.5 | 55.6 | 63.3 | 26.1 | 29.4 | 15.6 | 46.2 | | | 4 | 112.3 | 99.7 | 84.1 | 48.6 | 45.3 | 38.0 | 57.9 | | | 8 | 114.6 | 104.7 | 88.8 | 48.4 | 44.6 | 46.4 | 63.6 | | xgen-mm-phi3-mini-base-r-v1 (Ours) | 0 | 81.7 | 80.2 | 60.7 | 26.5 | 36.0 | 21.2 | 48.1 | | | 4 | 110.5 | 101.7 | 84.6 | 49.2 | 46.1 | 38.4 | 63.9 | | | 8 | 112.1 | 104.4 | 87.7 | 49.1 | 46.4 | 44.3 | 63.8 |
The instruct-tuned model, `xgen-mm-phi3-mini-instruct-r
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 May 2024
133 articles
Related Articles
Related Articles
More Stories