
Share
This guide walks you through quickly setting up text embeddings for the English Wikipedia using modal and Hugging Face, bypassing common hurdles like API rate limits and infrastructure headaches.
Text embeddings are a cornerstone of modern applications leveraging large language models (LLMs). They transform text into numerical vectors that capture semantic meaning, enabling tasks like search, recommendation, and Retrieval-Augmented Generation (RAG). While services like OpenAI’s text-embedding-ada-002 provide a convenient starting point, fine-tuning open-source models with your own data can yield higher-quality results at lower costs. However, scaling embedding jobs for large datasets is challenging due to rate limits, infrastructure complexity, and the difficulty of accessing multiple GPUs.
Enter Modal, a serverless platform that simplifies these challenges by enabling rapid scaling across many GPUs. In this article, we’ll walk through how to embed the entire English Wikipedia in just 15 minutes using Hugging Face’s Text Embedding Inference service on Modal. The total cost? Just over $15.
Closed-source models are excellent for initial development, but they have limitations in production:
Modal is a serverless platform designed for scaling compute-intensive workloads. It abstracts away infrastructure management, enabling you to focus on your code. Key features include:
First, ensure you have a Modal account and the necessary dependencies installed:
pip install modal-client huggingface_hub
Next, define a function to generate embeddings using Hugging Face’s Text Embedding Inference service:
import modal
from huggingface_hub import InferenceClient

client = InferenceClient()
@modal.function( image=modal.Image.debian_slim().pip_install("huggingface_hub"), gpu="A10G", ) def embed_text(texts): embeddings = client.text_embeddings(texts, model="sentence-transformers/all-MiniLM-L6-v2") return embeddings
#### 3. Prepare the Wikipedia Dataset
Download and preprocess the English Wikipedia dataset:
```python
import requests
from bs4 import BeautifulSoup
def fetch_wikipedia():
url = "https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2"
response = requests.get(url)
with open("enwiki-latest-pages-articles.xml.bz2", "wb") as file:
file.write(response.content)
# Preprocess the dataset (simplified for brevity)
def preprocess_wikipedia(file_path):
with bz2.BZ2File(file_path, 'r') as f:
xml_content = f.read()
soup = BeautifulSoup(xml_content, 'xml')
articles = [article.text for article in soup.find_all('text')]
return articles
Finally, use Modal to run the embedding job:
if __name__ == "__main__":
fetch_wikipedia()
articles = preprocess_wikipedia("enwiki-latest-pages-articles.xml.bz2")
# Split the dataset into chunks for parallel processing
chunk_size = 1000
article_chunks = [articles[i:i + chunk_size] for i in range(0, len(articles), chunk_size)]
# Run embeddings in parallel
results = modal.map(embed_text, article_chunks)
# Combine the results
all_embeddings = sum(results, [])
# Save or further process the embeddings
with open("wikipedia_embeddings.pkl", "wb") as file:
pickle.dump(all_embeddings, file)
By leveraging Modal and Hugging Face’s Text Embedding Inference
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
25 January 2024
88 articles
Related Articles
Related Articles
More Stories