
Share
Monolith addresses real-time recommendation challenges by integrating collisionless embedding tables that allow seamless updates without performance hitches, making it ideal for short-video platforms and online ads where speed and personalization are key.
Building a scalable and real-time recommendation system is crucial for businesses that rely on time-sensitive customer feedback, such as short-video platforms or online advertising. Traditional deep learning frameworks like TensorFlow and PyTorch, while powerful, often fall short in these scenarios due to their static parameter configurations and dense computations, which are not well-suited for the dynamic and sparse features common in recommendation systems. Moreover, these frameworks typically separate batch training from serving, hindering real-time interaction with user feedback.
To address these challenges, researchers at BytePlus have developed Monolith, a system specifically designed for online training. This paper, "Monolith: Real Time Recommendation System With Collisionless Embedding Table," introduces several innovative design choices that make the system highly effective in production environments.
Collisionless Embedding Table
Production-Ready Online Training Architecture
Trade-offs for Real-Time Learning

Embedding Layer: The embedding layer in Monolith is optimized to handle the high-dimensional and sparse nature of user-item interactions. It uses a collisionless hash table to store embeddings, ensuring that each item has a unique and stable representation.
Online Training Pipeline:
Serving Layer:
Monolith has been deployed in the BytePlus Recommend product, where it has demonstrated significant improvements in both performance and reliability. Key metrics include:
Monolith represents a significant advancement in the field of real-time recommendation systems. By addressing the limitations of traditional deep learning frameworks and introducing innovative design choices, it provides a robust solution for businesses that require immediate and accurate user feedback. The system's ability to balance memory efficiency, real-time learning, and fault-tolerance makes it a compelling choice for applications like short-video platforms and online advertising.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
20 January 2025
133 articles
Related Articles
Related Articles
More Stories