In response to this, Read the Docs has taken actions such as temporarily blocking all traffic from bots identified as AI Crawlers, monitoring bandwidth usage more closely, and working on more aggressive rate limiting rules. However, the additional bandwidth costs caused by AI crawlers are likely to result in the platform running out of AWS credits early. The platform is asking all AI companies to be more respectful when crawling sites and to implement basic checks in their crawlers. They are open to working with these companies to create a deal that allows respectful site crawling.
Key takeaways:
- AI crawlers are causing significant problems for Read the Docs, a community-supported site that hosts documentation for many projects, by aggressively pulling content and causing high bandwidth charges.
- Examples of abuse include one crawler downloading 73 TB of zipped HTML files in May 2024, costing over $5,000 in bandwidth charges, and another using Facebook's content downloader to download 10 TB of data in June 2024.
- Read the Docs has taken actions to mitigate this abuse, including temporarily blocking all traffic from bots identified as AI Crawlers, monitoring bandwidth usage more closely, and working on more aggressive rate limiting rules.
- Read the Docs is calling on all AI companies to be more respectful when crawling sites, suggesting the possibility of building an integration that would alert them to content changes and download only the files that have changed.