AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%

The Wikimedia Foundation has reported a 50% increase in bandwidth consumption for multimedia downloads from Wikimedia Commons since January 2024, primarily due to automated scrapers training AI models rather than human users. These scrapers, which account for 65% of the most resource-intensive traffic, often access less-frequently visited content stored in the core data center, making it more costly to serve. This has led the Wikimedia Foundation's site reliability team to invest significant resources in blocking these bots to prevent disruption for regular users.

The rise in AI scrapers ignoring "robots.txt" files is part of a broader trend threatening open internet infrastructure, with developers and companies like Cloudflare attempting to counteract the impact. However, this ongoing battle could push publishers to implement logins and paywalls, potentially restricting access to open web content. The situation highlights the growing tension between maintaining open access to information and managing the costs and risks associated with increased automated traffic.

Key takeaways

Bandwidth consumption for multimedia downloads from Wikimedia Commons has surged by 50% since January 2024, primarily due to automated scrapers training AI models.
Almost two-thirds of the most resource-intensive traffic on Wikimedia Commons is from bots, despite only accounting for 35% of overall pageviews.
The Wikimedia Foundation is spending significant resources to block crawlers and prevent disruption for regular users, highlighting a growing trend threatening the open internet.
Some tech companies are developing solutions to slow down AI scrapers, but the issue may push publishers to implement logins and paywalls.

AI crawlers cause Wikimedia Commons bandwidth demands to surge 50% | TechCrunch

Key takeaways

Discussion (0)