
Share
Aaron created tarpits-slow networks designed to trap and exhaust AI scrapers-to protect websites from relentless crawling, challenging the ethics of ignoring robots.txt protocols.
Last summer, Anthropic's ClaudeBot AI crawler sparked a significant backlash when it was accused of hitting websites with over a million requests per day. The controversy wasn't isolated; other AI companies were also under scrutiny for allegedly ignoring robots.txt instructions meant to prevent web content scraping on certain sites. Reddit’s CEO even called out these crawlers, labeling them “a pain in the ass to block,” despite industry norms that generally respect no-scraping directives.
Amid this turmoil, a software developer we'll call Aaron decided to take matters into his own hands. Frustrated by Facebook’s crawler hitting his site with over 30 million requests, Aaron developed a new kind of tarpit called Nepenthes, named after a carnivorous plant that traps and digests its prey.
Tarpitting is an anti-spam technique designed to slow down or trap malicious bots by making them waste time and resources. Traditionally used against email spammers, the tactic has now been adapted for web scraping. Aaron's Nepenthes is a piece of malware that site owners can deploy to create a digital tarpit specifically targeting AI crawlers.
Aaron warns that Nepenthes is aggressive and should only be used by site owners comfortable with the ethical implications. The software is not for those who are hesitant about trapping and potentially poisoning AI crawlers.

As of now, Aaron confirms that Nepenthes can effectively trap all major web crawlers, with one notable exception: OpenAI’s crawler has managed to escape. This resilience might be due to more sophisticated detection mechanisms or better adherence to robots.txt guidelines.
The broader impact of these tarpits is still uncertain. While they can certainly slow down and frustrate AI scrapers, the long-term effects on AI models are not yet clear. Laxmi Korada, Microsoft’s director of partner technology, published a report last May discussing data poisoning and how leading GenAI providers are addressing it. However, the effectiveness of tarpits in this context remains to be seen.
The use of tarpits raises significant ethical questions. On one hand, site owners have a right to control who accesses their content and how. On the other hand, deploying aggressive malware could escalate tensions and lead to unintended consequences, such as collateral damage to legitimate bots or users.
As the debate over AI web scraping continues, tarpits like Nepenthes offer a new tool for site owners to protect their data. Whether this tactic will become widespread and how it will affect the broader ecosystem of web content remains to be seen. For now, Aaron’s creation stands as a stark reminder of the ongoing struggle between AI companies and the websites they seek to scrape.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
4 February 2025
88 articles
Related Articles
Related Articles
More Stories