An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

A report from Stanford University's Internet Observatory has revealed that LAION-5B, a widely used machine learning dataset, contains thousands of suspected child sexual abuse images. The dataset, maintained by non-profit organization LAION (Large-scale Artificial Intelligence Open Network), is not a stored collection of images but a list of links to images indexed by the organization. Researchers used Microsoft's proprietary content filtering tool, PhotoDNA, to identify 3,226 instances of suspected child abuse material in the dataset.

Following the report, LAION took the dataset offline and released a statement affirming its zero-tolerance policy for illegal content. The organization is working with the Internet Watch Foundation (IWF) and others to find and remove links pointing to potentially unlawful content. The dataset has been used to train various AI applications, including the popular Stable Diffusion image generation app by Stability AI.

Key takeaways:

An influential machine learning dataset, LAION-5B, contains thousands of suspected images of child sexual abuse, according to a report by Stanford University’s Internet Observatory.
LAION-5B is maintained by the non-profit organization LAION and is a list of links to images that have been indexed by the organization, not a stored collection of images.
Researchers used PhotoDNA, a content filtering tool developed by Microsoft, to identify 3,226 instances of suspected child abuse material in the dataset.
After receiving the report, LAION took the dataset offline and released a statement saying it has a zero tolerance policy for illegal content and is working to remove links that may point to suspicious, potentially unlawful content.

An Influential AI Dataset Contains Thousands of Suspected Child Sexual Abuse Images

Key takeaways:

Comments (0)

Newsletter