Proofread: Gboard's Server-Side LLM for Seamless Sentence and Paragraph Correction

Models & Research

The Engineer

11 Jun 2024 · 3 min read

Google's Gboard now uses a server-side LLM to offer real-time sentence and paragraph corrections, enhancing typing accuracy with minimal user effort. This article explores its development journey.

The latest advancement in Large Language Models (LLMs) has led to a significant enhancement in user typing experiences, particularly with the introduction of Proofread-a new feature in Google’s Gboard keyboard. This feature leverages a server-side LLM to provide seamless sentence-level and paragraph-level corrections with just one tap. In this article, we delve into the technical details of how Proofread was developed, from data generation and model tuning to deployment on Pixel 8 devices.

Technical Overview

Data Generation

To ensure the quality of the models used in Proofread, a meticulous synthetic data pipeline was created. This pipeline is tailored for online use cases, generating diverse and realistic text samples that reflect various typing errors and contextual corrections. The data includes:

Typographical Errors: Common mistakes like transposed letters or missing characters.
Grammar Mistakes: Incorrect verb tenses, subject-verb agreement issues, etc.
Semantic Inconsistencies: Contextual errors that affect the meaning of a sentence.

Metrics Design

The team designed multifaceted metrics to evaluate the performance of the models. These metrics include:

Accuracy: The proportion of corrections that are correct and improve the text.
Fluency: How natural and readable the corrected text is.
Preservation of Meaning: Ensuring that the correction does not alter the original intent of the text.

Model Tuning

The tuning process involved a two-stage approach:

Supervised Fine-Tuning (SFT):
- Sequential Tuning: The model was fine-tuned on two tasks: Rewrite and Proofread. This sequential approach yielded better results compared to training on a single task.
- Foundational Quality: SFT ensured that the model had a strong baseline performance, capturing common errors and corrections effectively.

Reinforcement Learning (RL) Tuning:
- Global Rewards: These rewards are designed to improve overall correction quality by considering the entire text context.
- Direct Rewards: These focus on specific aspects of the correction, such as grammatical accuracy or fluency.
- Targeted Refinement: RL tuning further refined the model’s performance, addressing edge cases and improving the robustness of corrections.

Model Performance

The tuned PaLM2-XS model achieved a good ratio of 85.56% on a human-labeled golden set. This high performance indicates that the model is effective in correcting a wide range of errors with minimal user intervention.

Deployment

Proofread was launched on Pixel 8 devices, leveraging Google Cloud’s TPU v5 infrastructure for efficient and scalable deployment. The team implemented several optimizations to reduce serving latency:

Quantization: Reducing the precision of model parameters to speed up inference.
Bucket Inference: Grouping similar requests to process them more efficiently.
Text Segmentation: Breaking down longer texts into smaller segments for parallel processing.
Speculative Decoding: Pre-fetching and pre-processing text to reduce latency.

These optimizations allowed Proofread to handle thousands of daily active users with minimal delay, providing a smooth and responsive user experience.

Demonstration

A demo of the Proofread feature is available on YouTube, showcasing its effectiveness in real-world scenarios.

Conclusion

Proofread represents a significant step forward in leveraging LLMs to enhance typing experiences. By combining advanced data generation, multifaceted metrics, and sophisticated tuning techniques, the feature delivers seamless and accurate corrections with just one tap. This innovation not only improves user productivity but also sets a new standard for AI-powered text correction tools.