
Share
As GitHub Copilot ignited controversy over GPL's reach into AI models trained on open-source code, this piece explores ongoing debates and legal ambiguities surrounding copyleft licenses in machine learning frameworks.
When GitHub Copilot was launched in 2021, it sparked significant debate due to its training data, which included a vast amount of open-source code from GitHub. One of the most contentious issues was whether the conditions of copyleft licenses, such as the GNU General Public License (GPL), would propagate to the AI model itself, requiring that the entire model be released under the same license. While this theory was widely discussed initially, it has not been definitively resolved as of 2025. This article examines the current state of this debate and its implications for the legal landscape.
One of the primary battlegrounds for this issue is the class action lawsuit, Doe v. GitHub. Filed by a group of developers, the lawsuit alleges that GitHub violated open-source licenses by training Copilot on code without proper attribution and compliance with license conditions. Despite ongoing legal proceedings, the plaintiffs maintain that the GPL's copyleft provisions should extend to AI models trained on GPL-licensed code. This case has drawn significant attention from both the legal community and software developers.
Another critical lawsuit is GEMA v. OpenAI, where the German collecting society, GEMA, claims that AI models trained on copyrighted material infringe on copyright by "reproducing" the training data within their neural networks. This theory posits that the memory of the model constitutes a legal reproduction of the original code, thereby triggering the GPL's copyleft provisions. The case is still pending, and its outcome could have far-reaching implications for the use of open-source code in AI models.
The debate over whether GPL propagates to AI models trained on open-source code has significant implications for both software developers and AI companies. For developers, the propagation of the GPL would mean that any AI model trained on their code must be released under the same license, potentially limiting commercial use. For AI companies, this could impose additional legal and compliance burdens, affecting their ability to monetize their models.

Despite the risks, there are opportunities for stakeholders to navigate this complex landscape:
The debate over whether the GPL propagates to AI models trained on open-source code remains unresolved, with ongoing lawsuits and legal uncertainty. While this issue poses significant risks, it also presents opportunities for innovation and collaboration. As the legal landscape evolves, stakeholders must remain vigilant and proactive in addressing these challenges.
Tags
Original Sources
About the author
Marcus began tracking AI's market implications in 2016, noticing AI-related patent filings accelerating ahead of earnings upgrades before most of the sell-side had caught on. A former fixed-income quantitative analyst, he spent two decades building models that priced risk across emerging markets before pivoting to cover the economic impact of AI full-time. His writing translates opaque technical developments into clear risk/reward terms — and he's rarely diplomatic about the gap between AI valuations and underlying fundamentals. He believes most market participants still underestimate AI's long-run deflationary effect on knowledge work.
More from The Analyst →This Week's Edition
28 November 2025
133 articles
Related Articles
Related Articles
More Stories