The company acknowledges the complexity of setting these rules, particularly in matters of privacy and in defining what the AI should and shouldn't do. The instructions that cause the AI to adhere to the policy are also challenging to create, and there are bound to be failures as people find ways to circumvent them or discover unaccounted edge cases. While OpenAI isn't revealing all its strategies, it believes that sharing how these rules and guidelines are set will be beneficial to users and developers.
Key takeaways:
- OpenAI is offering a limited look at the reasoning behind its models’ rules of engagement, which includes sticking to brand guidelines or declining to make NSFW content.
- Large language models (LLMs) don’t have any naturally occurring limits on what they can or will say, which is why they need to have a few guardrails on what they should and shouldn’t do.
- OpenAI is publishing what it calls its “model spec,” a collection of high-level rules that indirectly govern ChatGPT and other models.
- OpenAI states that the developer intent is basically the highest law, and the model may decline to talk about anything not approved, in order to prevent any manipulation attempts.