
Share
Apple's MM1.5 leverages a data-centric fine-tuning method, incorporating diverse datasets to boost multimodal understanding in areas like image-text alignment and multi-image reasoning, marking a significant leap from its predecessor.
Apple Research has released a new family of multimodal large language models (MLLMs) called MM1.5, which aims to significantly enhance capabilities in text-rich image understanding, visual referring and grounding, and multi-image reasoning. This latest iteration builds on the MM1 architecture by adopting a data-centric approach to model training, exploring the impact of diverse data mixtures throughout the entire training lifecycle.

The MM1.5 family of models demonstrates significant improvements over its predecessor, especially in tasks that require understanding and reasoning about text within images. Here are some key benchmarks:
For practitioners working with multimodal data, MM1.5 offers several key benefits:
MM1.5 represents a significant step forward in the development of multimodal large language models. By adopting a data-centric approach and exploring diverse data mixtures, the team has created models that can better understand and reason about text within images. This research provides valuable insights for practitioners looking to enhance their own multimodal systems.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
2 October 2024
88 articles
Related Articles
Related Articles
More Stories