New York Times, CNN and ABC block OpenAI’s GPTBot web crawler from accessing content

Several major news outlets including the New York Times, CNN, Reuters, and the Australian Broadcasting Corporation (ABC) have blocked OpenAI's web crawler, GPTBot, from accessing their content. The tool, used to improve the company's AI models, has been disallowed by these outlets due to concerns over the presence of copyrighted material in their datasets. The block can be seen in the publishers' robots.txt files, which dictate what pages web crawlers can visit.

The move comes as news outlets globally grapple with the use of AI in news gathering and the potential use of their content in AI system training. Some outlets have called for regulation of AI, including transparency about training sets and consent for the use of copyrighted material. Meanwhile, Google has proposed that AI systems should be able to scrape publisher's work unless they explicitly opt out.

Key takeaways

Several news outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation have blocked a tool from OpenAI, known as GPTBot, from accessing their content.
GPTBot is a web crawler used by OpenAI to scan webpages and improve its AI models, such as the chatbot ChatGPT.
The block on GPTBot can be seen in the robots.txt files of the publishers, which dictate what pages crawlers from search engines and other entities are allowed to visit.
News outlets globally are grappling with decisions about the use of AI in news gathering and the potential use of their content in AI system training pools.

New York Times, CNN and ABC block OpenAI’s GPTBot web crawler from accessing content

Key takeaways

Discussion (0)