
Share
Researchers have uncovered a dangerous flaw: AI-generated code often includes references to fake third-party libraries, opening the door for hackers to launch dependency confusion attacks and compromise software security.
AI-generated code, while offering significant productivity gains for developers, also introduces a critical vulnerability in the software supply chain. A recent study highlights that large language models (LLMs) frequently generate references to non-existent third-party libraries, creating opportunities for malicious actors to exploit these dependencies through dependency confusion attacks. This poses a substantial threat to the integrity and security of software development processes.
The research, which analyzed 576,000 code samples generated by 16 widely used LLMs, found that 440,000 of the package dependencies were "hallucinated," meaning they did not exist. Open source models were particularly prone to this issue, with 21 percent of their dependencies linking to non-existent libraries. This phenomenon can lead to severe security vulnerabilities, as developers may unknowingly incorporate malicious code into their projects.
Dependency confusion attacks exploit these hallucinations by publishing malicious packages with the same names as legitimate ones but with higher version numbers. When a software project depends on such a package, it may inadvertently use the malicious version instead of the legitimate one, leading to potential data theft, backdoor installation, and other nefarious activities.
Increased Attack Surface: The prevalence of hallucinated dependencies significantly expands the attack surface for supply chain attacks. Malicious actors can more easily introduce harmful code by targeting these non-existent libraries.
Developer Trust Issues: Developers often trust LLM-generated code, assuming it is reliable and secure. This misplaced trust can lead to the incorporation of malicious packages without thorough verification.
Supply Chain Integrity: The integrity of the software supply chain is compromised when developers unknowingly include tainted dependencies. This can have far-reaching consequences, affecting all users downstream.
Despite these risks, there are actionable steps that can be taken to mitigate the threats posed by AI-generated code:

Enhanced Verification Processes: Developers should implement robust verification processes for any third-party libraries or packages used in their projects. This includes verifying the source and version numbers of dependencies before integration.
Automated Security Tools: Utilizing automated security tools can help detect and prevent the inclusion of malicious packages. These tools can scan code for known vulnerabilities and flag suspicious dependencies.
Model Training Improvements: Researchers and developers should focus on improving LLM training data to reduce hallucinations. This involves curating high-quality, accurate datasets that minimize the generation of non-existent dependencies.
Community Collaboration: Open source communities can play a crucial role in maintaining the security of software supply chains. By collaborating on best practices and sharing threat intelligence, developers can collectively enhance their defenses against supply chain attacks.
Joseph Spracklen, a Ph.D. student at the University of Texas at San Antonio and lead researcher on the study, explained the implications of these findings: "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users. If a user trusts the LLM’s output and installs the package without carefully verifying it, the attacker's payload, hidden in the malicious package, would be executed on the user’s system."
AI-generated code has the potential to revolutionize software development, but it also introduces significant security risks. By understanding the nature of these threats and implementing robust mitigation strategies, developers can continue to leverage AI while safeguarding the integrity of their software supply chains.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
9 May 2025
133 articles
Related Articles
Related Articles
More Stories