
Share
MobileNet-V4 optimizes computer vision tasks for edge devices, pushing the boundaries of efficiency with runtime optimization tailored for modern hardware, from tiny CPUs to advanced accelerators.
MobileNet-V4 has landed in timm, the PyTorch Image Models library, and it's a significant step forward for efficient computer vision on edge devices. This new model is designed to be runtime optimal on today’s mobile and edge hardware, from small DSP/CPU devices to modest accelerators like Google’s EdgeTPU found in modern smartphones.
Five years ago, MobileNet-V3 and EfficientNet were introduced by Google researchers. These models leveraged the Inverted Residual Block (IR), a key innovation that placed the wide part of the block at the depthwise convolution rather than at the start or end. The IR consists of:
Since then, timm has become the go-to repository for these architectures. It includes all officially released Tensorflow weights and numerous related models like MNasNet, FBNet v1/v2/v3, LCNet, TinyNet, and MixNet. Many of these weights are trained purely in PyTorch with PyTorch-friendly convolution padding.
MobileNet-V4 aims to push the boundaries further by optimizing for today's hardware. The key innovations include two new block types:
The UIB is a superset of the original Inverted Residual Block, designed to be more flexible and efficient across different hardware configurations. It allows for:
These features enable the model to better adapt to the computational constraints of edge devices while maintaining high accuracy.

The MQA block introduces a novel attention mechanism that is more efficient than traditional multi-head attention. It reduces the computational overhead by:
This makes MQA particularly suitable for resource-constrained environments where every operation counts.
MobileNet-V4 has been integrated into timm, making it accessible to a wide range of practitioners. The implementation includes:
Initial benchmarks show that MobileNet-V4 outperforms its predecessors on both accuracy and inference speed. Key highlights include:
MobileNet-V4 represents a significant advancement in efficient computer vision for edge devices. By introducing the Universal Inverted Bottleneck and Multi Query Attention, it addresses the unique challenges of modern hardware while maintaining high performance. For practitioners working with resource-constrained environments, this model is a must-try.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
27 May 2024
88 articles
Related Articles
Related Articles
More Stories