The rise in AI scrapers ignoring "robots.txt" files is part of a broader trend threatening open internet infrastructure, with developers and companies like Cloudflare attempting to counteract the impact. However, this ongoing battle could push publishers to implement logins and paywalls, potentially restricting access to open web content. The situation highlights the growing tension between maintaining open access to information and managing the costs and risks associated with increased automated traffic.
Key takeaways:
- Bandwidth consumption for multimedia downloads from Wikimedia Commons has surged by 50% since January 2024, primarily due to automated scrapers training AI models.
- Almost two-thirds of the most resource-intensive traffic on Wikimedia Commons is from bots, despite only accounting for 35% of overall pageviews.
- The Wikimedia Foundation is spending significant resources to block crawlers and prevent disruption for regular users, highlighting a growing trend threatening the open internet.
- Some tech companies are developing solutions to slow down AI scrapers, but the issue may push publishers to implement logins and paywalls.