New Switchable Backdoor Attack Targets Pre-trained Vision Transformers

Security & Risk

The Engineer

21 May 2024 · 3 min read

Researchers have developed SWARM, a sophisticated backdoor attack using a switch token to activate hidden malware in pre-trained vision transformers, posing a serious threat to cloud API users.

A recent paper from a team of researchers at multiple institutions has unveiled a novel security threat targeting pre-trained vision transformers (ViTs). The attack, dubbed SWARM, leverages an extra prompt token known as the "switch token" to covertly activate a backdoor in the model. This stealthy approach poses significant risks, especially for users relying on cloud APIs where such malicious behavior can remain undetected.

Technical Overview

What Changed?

The traditional security landscape of machine learning models has been extended to include pre-trained ViTs, which are increasingly popular due to their efficiency and effectiveness in various visual recognition tasks. The researchers have identified a new vulnerability: an attacker can inject a backdoor into a pre-trained model using a switch token. This token acts as a hidden trigger that, when activated, converts the benign model into a malicious one.

Why It Matters

For practitioners, this means that even trusted pre-trained models from reputable sources could be compromised. The attack is particularly insidious because it remains dormant until the switch token is used, making it difficult to detect and mitigate. This poses serious risks for applications in security-critical domains like healthcare, finance, and autonomous systems.

Attack Mechanism

Key Components

Switch Token: An extra prompt token that activates the backdoor mode.
Trigger: A specific input pattern that forces the model to predict a target class when the switch is on.
Clean Loss: Ensures the model behaves normally in the absence of the trigger.
Backdoor Loss: Ensures the backdoor can be activated by the trigger when the switch is on.

Implementation Details

Trigger and Token Optimization:
- The attack learns a trigger and prompt tokens, including the switch token.
- These are optimized using two loss functions: clean loss and backdoor loss.
- Clean loss ensures that the model behaves normally even when the trigger is present but the switch is off.
- Backdoor loss ensures that the backdoor can be activated by the trigger when the switch is on.
Cross-Mode Feature Distillation:
- This technique reduces the impact of the switch token on clean samples, making the attack more stealthy.
- It involves distilling features from both benign and backdoored modes to ensure minimal disruption in normal operation.

Experimental Results

The researchers tested SWARM on diverse visual recognition tasks and achieved a 95%+ attack success rate. The model's behavior under the benign mode remained indistinguishable from its original performance, making the attack hard to detect and remove.

Attack Success Rate: Over 95% across various tasks.
Stealthiness: The model behaves normally in the absence of the switch token, ensuring that the backdoor remains hidden.

Implications for Practitioners

What You Need to Do

Model Auditing:
- Regularly audit pre-trained models for unexpected behavior or hidden tokens.
Input Validation:
- Implement robust input validation to detect and block suspicious triggers.
Monitoring and Logging:
- Enhance monitoring and logging mechanisms to identify any anomalies in model performance.

Conclusion

The discovery of the switchable backdoor attack on pre-trained vision transformers highlights a new front in AI security. Practitioners must remain vigilant and adopt proactive measures to protect against such threats. The research team's code is available for further study, providing a valuable resource for the community to develop countermeasures.