Google adds a switch for publishers to opt out of becoming AI training data

Google has introduced a new tool, Google-Extended, that allows website publishers to opt out of having their data used for training Google's AI models, while still being accessible on Google Search. The tool enables publishers to manage whether their sites contribute to the improvement of Bard and Vertex AI generative APIs, and control access to site content. The tool is accessible via robots.txt, a text file that guides web crawlers on site accessibility.

The introduction of Google-Extended comes after Google's announcement in July that it is training its AI chatbot, Bard, on publicly available data from the web. This move follows the trend of many sites blocking the web crawler used by OpenAI to scrape data and train ChatGPT, including prominent publishers like The New York Times, CNN, Reuters, and Medium.

Key takeaways

Google has introduced a new tool called Google-Extended that allows website publishers to opt out of having their data used to train Google's AI models.
The tool still allows sites to be scraped and indexed by Googlebot, but prevents the data from being used to train current and future AI models.
Google-Extended lets publishers manage whether their sites help improve Bard and Vertex AI generative APIs and control access to content on their site.
The tool is accessible through robots.txt, a text file that informs web crawlers whether they can access certain sites.

Google adds a switch for publishers to opt out of becoming AI training data

Key takeaways

Discussion (0)