AI is killing the grand bargain at the heart of the web. 'We're in a different world.'

The article discusses the growing concern over the use of web crawlers by big tech companies to scrape and store data for AI training. These bots, such as Common Crawl and GPTbot, are being used to feed large datasets that are then used to develop AI models. This practice is changing the purpose of web crawlers from supporting content creators to being used against them. The only way to block these crawlers is through a code called robots.txt, but it's not very effective and has no legal basis.

The article also highlights the backlash from content creators and owners who are increasingly blocking these bots from accessing their data. However, the use of robots.txt is voluntary and can be easily bypassed or ignored by web crawlers. The article concludes with a warning that the internet could change dramatically if content creators stop posting information online to prevent their data from being used by AI models. This could lead to the internet becoming a series of paywalled gardens, limiting access to knowledge and creativity.

Key takeaways:

Web crawlers are collecting online information to feed into giant datasets used by tech companies to develop AI models, changing the mission of web crawlers from supporting content creators to being used against them.
Blocking these crawlers is done through implementing robots.txt on a website, a method that is open to manipulation and can be ignored by web crawlers.
Common Crawl, via CCBot, holds the largest repository of data ever collected from the internet, with the data being used by large corporations to create proprietary models.
There is growing concern that the internet could become a series of paywalled gardens if content creators stop posting information online due to their data being used for free by AI models.

AI is killing the grand bargain at the heart of the web. 'We're in a different world.'

Key takeaways:

Comments (0)

Newsletter