
Share
As coding AI tools spark debate with their ability to perform "clean room" implementations, questions arise about intellectual property and the ethics of recreating software from scratch.
Over the past few months, coding agents-AI systems designed to write code-have demonstrated an impressive ability to create what can be loosely termed a "clean room" implementation of existing software. This technique involves reverse-engineering the functionality of a piece of software and then reimplementing it from scratch, often under a different license. The most famous historical example is Compaq’s 1982 creation of a clean-room clone of IBM's BIOS, where one team reverse-engineered the BIOS to create a specification, which was then handed to another team for a ground-up implementation.
The recent controversy surrounding the Python library chardet has brought this issue to the forefront. Chardet, originally created by Mark Pilgrim in 2006, was released under the LGPL (Lesser General Public License). After Pilgrim retired from public internet life in 2011, maintenance of chardet was taken over by others, notably Dan Blanchard, who has been responsible for every release since version 1.1 in July 2012.
Two days ago, Dan released chardet 7.0.0, which includes the following note in the release notes:
Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API, drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!
Yesterday, Mark Pilgrim opened an issue on the chardet GitHub repository titled "No right to relicense this project." In his post, he thanked the current maintainers and contributors but raised serious concerns about the relicensing:
First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story.
However, it has been brought to my attention that, in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a “complete rewrite” is irrelevant, since they had ample exposure to the originally licensed code (i.e., this is not a “clean room” implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights.

Dan Blanchard responded with a lengthy comment addressing Mark's concerns. He argued that the new version of chardet was indeed a ground-up rewrite and that no direct copying of the original codebase occurred. He also emphasized that the use of AI tools to assist in the development process does not violate the spirit or letter of the LGPL, as the AI systems did not have access to the original source code during their operation.
This controversy highlights several important ethical and legal issues:
To understand the technical aspects of this controversy:
chardet is not only faster but also more accurate. This suggests that significant optimization and algorithmic improvements were made during the rewrite.Tags
Original Sources
↗ https://simonwillison.net/2026/Mar/5/chardet/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
6 March 2026
133 articles
Related Articles
Related Articles
More Stories