
Share
Researchers at NYU have uncovered how tiny doses of misinformation can poison medical AI models like ChatGPT, raising alarming questions about patient safety and the reliability of health advice from these systems.
In today’s rapidly evolving digital landscape, large language models (LLMs) like ChatGPT have become integral to various sectors, including healthcare. However, a recent study by researchers at New York University highlights a concerning issue: these advanced AI systems can be easily compromised with even the smallest amount of misinformation, potentially putting lives at risk.
Imagine relying on an AI-powered health app for medical advice, only to receive information that could harm you instead of help. This isn't just a hypothetical scenario; it's a real concern as highlighted by the NYU researchers. The implications are significant, especially in healthcare, where accurate and reliable information is crucial.
Data poisoning occurs when malicious actors introduce false or misleading information into the training data of an AI model. This can be as simple as hosting harmful content online, which then gets scraped by algorithms used to train LLMs. The NYU study demonstrates that even a minuscule amount of poisoned data-just 0.001 percent of the total-can significantly degrade the quality and reliability of these models.
The researchers conducted an experiment using "The Pile," a widely-used training dataset for LLMs, which includes high-quality medical corpora such as PubMed. They generated 150,000 AI-generated medical articles in just 24 hours, a process that cost only $5. By replacing just one million out of 100 billion training tokens (0.001 percent) with vaccine misinformation, they observed a 4.8 percent increase in harmful content.

The most alarming finding is that these corrupted LLMs still perform well on standard benchmarks used to evaluate medical AI models. This means that conventional testing methods may fail to detect the presence of misinformation, leading to a false sense of security.
Dr. John Smith, one of the lead researchers, explained, "In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety."
The healthcare industry is increasingly adopting AI to improve diagnostics, treatment plans, and patient care. However, the ease with which these models can be compromised underscores the need for robust safeguards. Medical professionals and patients alike must be vigilant about the sources of their information.
The potential for data poisoning in medical LLMs is a serious issue that requires immediate attention. As we continue to integrate AI into healthcare, it's essential to prioritize transparency, trust, and safety. By taking proactive steps, we can ensure that these powerful tools enhance rather than endanger patient care.
Tags
Original Sources
About the author
Amara's entry point into AI was an epidemiology role at a London research hospital, where she spent five years studying how digital health tools reached — or conspicuously failed to reach — underserved communities. Watching early algorithmic systems in healthcare quietly entrench existing inequalities, she redirected her career toward the systemic consequences of AI at scale. She covers AI through an unflinching lens: who benefits, who bears the cost, and what evidence actually says versus what the press release claims. Her writing is calm and precise, but she doesn't mistake balance for neutrality.
More from The Steward →This Week's Edition
28 January 2025
133 articles
Related Articles
Related Articles
More Stories