Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Top websites block Google from training AI models on their data. Nowhere near as much as OpenAI, though.

Mar 14, 2024 - businessinsider.com
Google has launched a tool called Google-Extended that allows websites to block the tech giant from using their content for training AI models. This comes as AI models are increasingly answering user queries directly, potentially reducing traffic to websites. The tool, which was released in September, is being used by about 10% of the top 1,000 websites, including The New York Times, CNN, BBC, Yelp, and Business Insider. However, it has seen less uptake than OpenAI's GPTBot, which is used by around 32% of the top 1,000 websites.

The New York Times, which is in a copyright dispute with OpenAI, has also blocked the startup's access to its content. Originality.ai CEO Jonathan Gillham suggests that if Google launches a generative AI search engine, sites that have blocked Google's access to training data may not appear in AI-generated results. Google is currently testing an early version of such a search engine, called Search Generative Experience (SGE), but it is unclear if or when it will be fully launched.

Key takeaways:

  • Google has launched a new tool called Google-Extended that allows websites to block the company from using their content for training AI models.
  • About 10% of the top 1,000 websites are using the Google-Extended snippet to block tech companies from using their content for AI model training, including The New York Times, CNN, BBC, Yelp, and Business Insider.
  • Google-Extended is being used less than other AI training data-blockers, such as OpenAI's GPTBot and CCBot offered by Common Crawl.
  • Google is testing an early version of a generative AI search engine through its Search Generative Experience (SGE), but it's unclear if the company will fully launch this in the future.
View Full Article

Comments (0)

Be the first to comment!