The author also raises concerns about the potential misuse of this technique. As it is relatively easy to hide messages in base64 encoding, it could be used as an attack vector by malicious actors. The author shares an anecdote where a colleague was confused when the AI seemed to know his name, demonstrating the potential for confusion or deception using this method.
Key takeaways:
- The recent publication on LLM 'sleeper agents' has led to further exploration on influencing LLMs to pursue alternative objectives without altering training data.
- GPT-4 can encode/decode base64 somewhat consistently, which can be used to hide/inject secret messages in prompts.
- Base64 encodings can be hidden in a code-question type example, and using GPT-4 Turbo, it is possible to get GPT to comply with the hidden message.
- Since it is quite simple to start hiding messages/prompts in Base64 encoding, it can reasonably be an attack vector for malicious actors.