
Share
Scientists have devised a novel method to extract specific data from popular language models like ChatGPT and PaLM-2, raising serious concerns about model security and privacy.
In a groundbreaking study, researchers from leading institutions have developed the first model-stealing attack capable of extracting precise information from black-box production language models like OpenAI's ChatGPT and Google's PaLM-2. This new technique, detailed in a paper titled "Stealing Part of a Production Language Model," reveals how attackers can recover critical components of these models using typical API access.
The researchers introduced an attack that targets the embedding projection layer of transformer models. The embedding projection layer is crucial because it maps input tokens (words or subwords) into high-dimensional vectors, which are then processed by the model's layers. By recovering this layer, attackers can gain insights into the model's internal structure and potentially use this information for various malicious purposes.

This attack has significant implications for the security and integrity of production language models. By exposing the internal structure of these models, attackers could:
The researchers suggest several potential defenses:
This research highlights the ongoing challenges in securing AI models, particularly those deployed as black-box services. As language models continue to play a crucial role in various applications, understanding and mitigating these security risks is essential for maintaining their integrity and trustworthiness.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 March 2024
88 articles
Related Articles
Related Articles
More Stories