Following the report, LAION took the dataset offline and released a statement affirming its zero-tolerance policy for illegal content. The organization is working with the Internet Watch Foundation (IWF) and others to find and remove links pointing to potentially unlawful content. The dataset has been used to train various AI applications, including the popular Stable Diffusion image generation app by Stability AI.
Key takeaways:
- An influential machine learning dataset, LAION-5B, contains thousands of suspected images of child sexual abuse, according to a report by Stanford University’s Internet Observatory.
- LAION-5B is maintained by the non-profit organization LAION and is a list of links to images that have been indexed by the organization, not a stored collection of images.
- Researchers used PhotoDNA, a content filtering tool developed by Microsoft, to identify 3,226 instances of suspected child abuse material in the dataset.
- After receiving the report, LAION took the dataset offline and released a statement saying it has a zero tolerance policy for illegal content and is working to remove links that may point to suspicious, potentially unlawful content.