Invisible Unicode Characters Exploit LLMs for Covert Communication

Security & Risk

The Engineer

18 Oct 2024 · 3 min read

Researchers unveil a surprising flaw in LLMs: they can detect and transmit invisible Unicode characters, enabling sneaky communication that humans can't see. This raises red flags for security and privacy.

In a striking revelation, researchers have discovered that large language models (LLMs) like Claude and Copilot can read and write invisible text, creating an ideal covert channel for malicious activities. This vulnerability arises from the Unicode standard's quirk, which allows certain characters to be recognized by AI but remain invisible to human users. The implications are significant for both security practitioners and developers, as it opens up new vectors for prompt injection and data exfiltration.

How It Works

The core of this issue lies in how Unicode handles certain characters that are non-renderable or invisible when displayed on a screen. These characters can be embedded within normal text without affecting its appearance to human readers. However, LLMs can still process these characters, making them an ideal medium for steganographic (hidden) communication.

Invisible Characters: Certain Unicode characters, such as zero-width spaces and non-breaking spaces, are invisible when rendered but have distinct meanings in the context of text encoding.
Steganography: By embedding these invisible characters within normal text, attackers can hide malicious payloads or exfiltrate sensitive data without arousing suspicion.

Real-World Impact

Joseph Thacker, an independent researcher and AI engineer at Appomni, highlighted the severity of this issue: “The fact that GPT 4.0 and Claude Opus were able to really understand those invisible tags was really mind-blowing to me and made the whole AI security space much more interesting.” The ability of these models to interpret non-renderable characters significantly expands the attack surface.

Proof-of-Concept Attacks

To demonstrate the practicality of this technique, Johann Rehberger, a researcher who coined the term "ASCII smuggling," created two proof-of-concept (POC) attacks targeting Microsoft 365 Copilot. These attacks illustrate how invisible characters can be used to extract sensitive information:

Sales Figures Extraction:
- The attack searches a user's inbox for emails containing sales figures.
- Once found, the model appends the figures in invisible characters to a URL and instructs the user to visit the link.
- The user, unaware of the hidden content, clicks the link, sending the data to the attacker’s server.
One-Time Passcode Extraction:
- Similar to the first attack, this one targets one-time passcodes (OTPs) in emails.
- The model appends the OTP in invisible characters to a URL and instructs the user to click it.
- Again, the hidden content is exfiltrated to the attacker’s server.

Mitigations

Microsoft introduced mitigations for these attacks several months after they were discovered. However, the broader implications of this vulnerability remain concerning:

User Awareness: Educating users about the potential risks and encouraging them to be cautious when interacting with AI-generated links.
Input Sanitization: Implementing robust input sanitization techniques to filter out non-renderable characters before processing user inputs in LLMs.
Model Enhancements: Continuously updating and enhancing LLMs to better handle and detect such covert channels.

Conclusion

The discovery of invisible Unicode characters as a covert communication channel underscores the evolving nature of AI security. As LLMs become more sophisticated, so do the methods used to exploit them. Security practitioners and developers must stay vigilant and proactive in addressing these emerging threats.