Microsoft-affiliated research finds flaws in GPT-4

A new scientific paper affiliated with Microsoft has examined the trustworthiness and toxicity of large language models (LLMs) such as OpenAI's GPT-4 and GPT-3.5. The study found that GPT-4, while generally more trustworthy than GPT-3.5, is more susceptible to "jailbreaking" prompts that bypass its safety measures, leading it to produce toxic and biased text. The researchers also found that GPT-4 is more likely to leak private, sensitive data when given certain prompts.

Despite these findings, Microsoft has confirmed that these vulnerabilities do not impact its customer-facing services, as finished AI applications use various mitigation approaches. The research has been shared with OpenAI, which has acknowledged the potential vulnerabilities. The researchers have also open-sourced their code on GitHub to encourage further study and to pre-empt potential exploitation of these vulnerabilities.

Key takeaways:

A new scientific paper affiliated with Microsoft has found that large language models (LLMs) like OpenAI's GPT-4 can be prompted to produce toxic, biased text, especially when given 'jailbreaking' prompts that bypass the model's safety measures.
Despite GPT-4 being generally more trustworthy than its predecessor, GPT-3.5, on standard benchmarks, it is more vulnerable to these jailbreaking prompts, potentially due to its tendency to follow instructions more precisely.
The research team worked with Microsoft product groups to ensure that the vulnerabilities identified do not impact current customer-facing services, and have shared their findings with OpenAI.
The researchers have open sourced the code they used to benchmark the models on GitHub, with the aim of encouraging others in the research community to build upon their work and potentially pre-empt harmful exploitation of these vulnerabilities.

Microsoft-affiliated research finds flaws in GPT-4 | TechCrunch

Key takeaways:

Comments (0)

Newsletter