Search Immortality Topics:



Researchers jailbreak AI chatbots with ASCII art — ArtPrompt bypasses safety measures to unlock malicious queries – Tom’s Hardware

Posted: March 10, 2024 at 3:17 am

Researchers based in Washington and Chicago have developed ArtPrompt, a new way to circumvent the safety measures built into large language models (LLMs). According to the research paper ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs, chatbots such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2 can be induced to respond to queries they are designed to reject using ASCII art prompts generated by their ArtPrompt tool. It is a simple and effective attack, and the paper provides examples of the ArtPrompt-induced chatbots advising on how to build bombs and make counterfeit money.

Image 1 of 2

ArtPrompt consists of two steps, namely word masking and cloaked prompt generation. In the word masking step, given the targeted behavior that the attacker aims to provoke, the attacker first masks the sensitive words in the prompt that will likely conflict with the safety alignment of LLMs, resulting in prompt rejection. In the cloaked prompt generation step, the attacker uses an ASCII art generator to replace the identified words with those represented in the form of ASCII art. Finally, the generated ASCII art is substituted into the original prompt, which will be sent to the victim LLM to generate response.

Artificial intelligence (AI) wielding chatbots are increasingly locked down to avoid malicious abuse. AI developers don't want their products to be subverted to promote hateful, violent, illegal, or similarly harmful content. So, if you were to query one of the mainstream chatbots today about how to do something malicious or illegal, you would likely only face rejection. Moreover, in a kind of technological game of whack-a-mole, the major AI players have spent plenty of time plugging linguistic and semantic holes to prevent people from wandering outside the guardrails. This is why ArtPrompt is quite an eyebrow-raising development.

To best understand ArtPrompt and how it works, it is probably simplest to check out the two examples provided by the research team behind the tool. In Figure 1 above, you can see that ArtPrompt easily sidesteps the protections of contemporary LLMs. The tool replaces the 'safety word' with an ASCII art representation of the word to form a new prompt. The LLM recognizes the ArtPrompt prompt output but sees no issue in responding, as the prompt doesn't trigger any ethical or safety safeguards.

Another example provided in the research paper shows us how to successfully query an LLM about counterfeiting cash. Tricking a chatbot this way seems so basic, but the ArtPrompt developers assert how their tool fools today's LLMs "effectively and efficiently." Moreover, they claim it "outperforms all [other] attacks on average" and remains a practical, viable attack for multimodal language models for now.

The last time we reported on AI chatbot jailbreaking, some enterprising researchers from NTU were working on Masterkey, an automated method of using the power of one LLM to jailbreak another.

See original here:

Researchers jailbreak AI chatbots with ASCII art -- ArtPrompt bypasses safety measures to unlock malicious queries - Tom's Hardware

Recommendation and review posted by G. Smith