
Share
This article shows how to upgrade Gemini 2.5 with Mem0 for long-term memory, transforming stateless interactions into personalized, context-rich conversations that remember user preferences and history.
By default, large language models (LLMs) like Gemini 2.5 are stateless, meaning they don't retain information from previous interactions. This can be a significant limitation when building personalized and context-aware AI applications. However, by integrating long-term memory systems, you can create more engaging and helpful chatbots that remember user details and provide relevant responses.
In this guide, we'll explore how to add long-term memory to your Gemini 2.5 chatbot using the Mem0 open-source tool. This integration will enable your chatbot to:
Mem0 is designed to equip AI agents with scalable long-term memory, addressing the limitations of fixed context windows in LLMs. The process involves four key steps:
Mem0 uses vector embeddings to store and retrieve semantic information, maintaining user-specific context across sessions and efficiently retrieving relevant past interactions.
To get started, you need to install the necessary libraries and obtain an API key:
Install google-genai and mem0ai:
!uv pip install google-genai mem0ai --upgrade
Obtain an API key from Google AI Studio: API Key
For building the memory system, you need to configure two main components:
In this example, we will use Google's Gemini models for both tasks:
gemini-2.5-flashtext-embedding-004
We will also use a local Qdrant instance as our vector store. Mem0 supports multiple vector stores, including MongoDB and others.
Here’s a step-by-step guide to setting up the memory system:
Initialize the LLM and Embedding Model:
from google.genai import LanguageModel, TextEmbedding
from mem0ai import MemoryStore
# Initialize the LLM
llm = LanguageModel.from_pretrained("gemini-2.5-flash")
# Initialize the embedding model
embedding_model = TextEmbedding.from_pretrained("text-embedding-004")
Set Up the Memory Store:
from qdrant_client import QdrantClient
# Connect to a local Qdrant instance
client = QdrantClient(host="localhost", port=6333)
# Initialize the memory store with the LLM and embedding model
memory_store = MemoryStore(llm, embedding_model, client)
Process Conversations:
def process_conversation(user_input, user_id):
# Get the conversation history for the user
conversation_history = get_conversation_history(user_id)
# Extract salient information from the conversation
summary = llm.summarize(conversation_history + [user_input])
# Process context and extract new information
new_info = llm.extract_information(summary, conversation_history)
# Update the memory store
memory_store.update_memory(user_id, new_info)
Retrieve Memories:
def get_relevant_memories(user_id, query):
# Retrieve relevant memories based on the query
relevant_memories = memory_store.retrieve_memories(user_id, query)
return relevant_memories
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
7 July 2025
88 articles
Related Articles
Related Articles
More Stories