The Ongoing Debate: Does GPL Propagate to AI Models Trained on Open Source Code?

Policy & Regulation

The Analyst

28 Nov 2025 · 4 min read

As GitHub Copilot ignited controversy over GPL's reach into AI models trained on open-source code, this piece explores ongoing debates and legal ambiguities surrounding copyleft licenses in machine learning frameworks.

When GitHub Copilot was launched in 2021, it sparked significant debate due to its training data, which included a vast amount of open-source code from GitHub. One of the most contentious issues was whether the conditions of copyleft licenses, such as the GNU General Public License (GPL), would propagate to the AI model itself, requiring that the entire model be released under the same license. While this theory was widely discussed initially, it has not been definitively resolved as of 2025. This article examines the current state of this debate and its implications for the legal landscape.

The Current Standing in Two Lawsuits

Doe v. GitHub (Copilot Class Action): The Persisting Claim of Open Source License Violation

One of the primary battlegrounds for this issue is the class action lawsuit, Doe v. GitHub. Filed by a group of developers, the lawsuit alleges that GitHub violated open-source licenses by training Copilot on code without proper attribution and compliance with license conditions. Despite ongoing legal proceedings, the plaintiffs maintain that the GPL's copyleft provisions should extend to AI models trained on GPL-licensed code. This case has drawn significant attention from both the legal community and software developers.

GEMA v. OpenAI: The Theory Treating “Memory” in Models as Legal Reproduction

Another critical lawsuit is GEMA v. OpenAI, where the German collecting society, GEMA, claims that AI models trained on copyrighted material infringe on copyright by "reproducing" the training data within their neural networks. This theory posits that the memory of the model constitutes a legal reproduction of the original code, thereby triggering the GPL's copyleft provisions. The case is still pending, and its outcome could have far-reaching implications for the use of open-source code in AI models.

Why It Matters

The debate over whether GPL propagates to AI models trained on open-source code has significant implications for both software developers and AI companies. For developers, the propagation of the GPL would mean that any AI model trained on their code must be released under the same license, potentially limiting commercial use. For AI companies, this could impose additional legal and compliance burdens, affecting their ability to monetize their models.

Key Risks

Legal Uncertainty: The ongoing lawsuits highlight the lack of clear legal precedents in this area. Until a definitive ruling is made, companies face significant uncertainty about the licensing requirements for AI models.
Commercial Impact: If the GPL does propagate to AI models, it could significantly impact the commercial viability of these models. Companies may need to reconsider their business models and licensing strategies to comply with open-source requirements.
Developer Trust: The debate has also raised concerns among developers about the ethical use of their contributions. Maintaining trust in the open-source community is crucial for continued collaboration and innovation.

The Opportunity

Despite the risks, there are opportunities for stakeholders to navigate this complex landscape:

Collaborative Licensing Models: Developers and AI companies could explore collaborative licensing models that balance the need for open-source contribution with commercial viability. For example, hybrid licenses that combine elements of copyleft and permissive licensing could be a viable solution.
Transparency and Attribution: Companies can build trust by being transparent about their training data sources and providing proper attribution to contributors. This approach can foster goodwill and reduce legal risks.
Legal Clarity through Advocacy: Engaging in advocacy efforts to clarify the legal status of AI models trained on open-source code could provide much-needed clarity for all stakeholders. Industry groups and legal experts can play a crucial role in shaping future regulations.

Conclusion

The debate over whether the GPL propagates to AI models trained on open-source code remains unresolved, with ongoing lawsuits and legal uncertainty. While this issue poses significant risks, it also presents opportunities for innovation and collaboration. As the legal landscape evolves, stakeholders must remain vigilant and proactive in addressing these challenges.