AI Art Generators Can Be Fooled Into Making NSFW Images

Researchers from Johns Hopkins University and Duke University have developed an algorithm called SneakyPrompt that can trick AI art generators like DALL-E 2 and Midjourney into producing inappropriate images. The algorithm uses nonsense words and phrases that bypass the AI's safety filters, which are designed to prevent the generation of pornographic or violent images. The researchers aim to use this algorithm to identify vulnerabilities in AI models and strengthen their safeguards.

The study found that AI systems can interpret nonsense words as commands to produce images, even those that are not safe for work. The AI's safety filters do not recognize these nonsense words as linked to forbidden terms, leading to the generation of questionable content. The researchers also discovered that AI could mistake regular words for other words based on the context. The findings highlight the potential for misuse of generative AI and the need for more robust safety measures.

Key takeaways:

Researchers from Johns Hopkins University and Duke University have developed an algorithm, SneakyPrompt, that can trick AI art generators into producing inappropriate images, highlighting vulnerabilities in AI safety filters.
The algorithm uses nonsense words or phrases that the AI misinterprets as commands to generate images, including those that are pornographic or violent.
These findings reveal that generative AIs could be exploited to create disruptive content, such as images of real people engaged in misconduct they never actually did.
The researchers aim to use these findings to make generative AIs more robust to adversaries and strengthen their safety filters.

AI Art Generators Can Be Fooled Into Making NSFW Images

Key takeaways:

Comments (0)

Newsletter