Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries

Researchers from Washington and Chicago have created ArtPrompt, a tool that can bypass the safety measures of large language models (LLMs) like GPT-3.5, GPT-4, Gemini, Claude, and Llama2. The tool uses ASCII art prompts to induce these chatbots to respond to queries they are programmed to reject. The method involves masking sensitive words that would trigger safety alignment and replacing them with ASCII art representations. This allows the prompt to bypass the LLM's safeguards and generate a response.

Artificial intelligence chatbots are typically designed to reject queries involving harmful or illegal activities. However, ArtPrompt has proven capable of circumventing these protections, even prompting chatbots to advise on illegal activities such as building bombs and counterfeiting money. The researchers claim that ArtPrompt outperforms other attacks and remains a practical attack for multimodal language models.

Key takeaways:

Researchers have developed a tool called ArtPrompt that can bypass the safety measures of large language models (LLMs) such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2.
ArtPrompt works by masking sensitive words in a prompt and replacing them with ASCII art, which does not trigger the safety alignment of LLMs, allowing for responses to queries that would normally be rejected.
The tool has been shown to be effective in inducing chatbots to provide advice on illegal activities, such as building bombs and counterfeiting money.
Despite efforts by AI developers to prevent malicious use of their products, tools like ArtPrompt present a significant challenge, as they can easily sidestep current safety measures.

Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries

Key takeaways:

Comments (0)

Newsletter