Anthropic is scraping websites so fast it’s causing problems

The AI company Anthropic has been heavily criticized for aggressively scraping data from websites to train its Claude LLM, regardless of whether permission has been granted. The company's bot, ClaudeBot, has reportedly hit sites like Ifixit.com a million times in one day and Freelancer.com 3.5 million times in four hours, leading to them being described as "the most aggressive" bot ever seen. This has led to significant resource drain and disruption for these sites, with Freelancer.com having to block Anthropic entirely.

Despite sites using robots.txt to specify which data can be accessed by web crawlers, Anthropic has been accused of ignoring these and taking data regardless. This is despite the company being founded by former OpenAI researchers with the aim of developing "responsible" AI systems. The issue of overly aggressive web crawlers is reportedly common across the AI industry, leading to calls for AI companies to be more respectful in their data gathering practices.

Key takeaways:

Anthropic has been aggressively scraping data from websites to train its Claude LLM, regardless of permission.
Anthropic's ClaudeBot has been reported to hit sites millions of times in a short period, causing significant strain on resources.
Despite the use of robots.txt by websites to indicate what data can be accessed, Anthropic ignores it and takes the data anyway.
Anthropic was founded by former OpenAI researchers with the promise to develop 'responsible' AI systems, but their current practices are being questioned.

Anthropic is scraping websites so fast it’s causing problems

Key takeaways:

Comments (0)

Newsletter