Training AI

John Gruber argues against the idea that public data should be freely available for AI training on an opt-out basis. He emphasizes that the act of publishing content online does not automatically make it available for AI training. He asserts that the term "public web" implies free access, not free use. Gruber also points out that people often post content online that they do not own, and therefore, they should not have the right to decide whether this content can be used for AI training.

Gruber further expresses his concern about the "opt-out" style of AI training, where the decision to disallow AI training webcrawlers lies with someone other than the content owner. He highlights the issue of people posting content they do not own on platforms they also do not own, like social media, where they may lack the knowledge or power to disallow AI training. He concludes by expressing his frustration about having to constantly block these bots on servers he controls.

Key takeaways

Public data should not be excluded on an opt-out basis for AI training, as it infringes on ownership and copyright laws.
Just because content is published on the web, it doesn't mean it's free to use for AI training.
People often post content they don't own, and they shouldn't have the right to decide whether this content can be used for AI training.
The author struggles with the lack of control over whether their content, reposted by others, is used for AI training, especially on platforms they don't own like social media.

Training AI

Key takeaways

Discussion (0)