To counter this, researchers are considering ways to protect the humanity of crowdsourced data and the idea of using "standardized" datasets curated by humans. However, distinguishing human-generated data from synthetic content and filtering out the latter is a complex task. Additionally, using historical data is not seen as a solution due to its inability to reflect a changing world. The article concludes by questioning whether an image edited with generative AI is considered AI-generated or not.
Key takeaways:
- AI-generated content is becoming increasingly prevalent on the internet, and there are concerns that this could inadvertently introduce errors into new AI models being trained, a phenomenon referred to as 'model collapse'.
- Researchers have found that even a small amount of AI-generated text in a training data set can be 'poisonous' to the model being trained, leading to a build-up of errors with each succeeding generation of models.
- There are fears that this could exacerbate existing biases in AI models, particularly against marginalized groups, and reduce the diversity of AI output.
- Researchers are considering ways to protect the 'humanity' of crowdsourced data and are exploring the idea of using 'standardized' data sets curated by humans to avoid the influence of generative AI.