However, the use of RBR raises concerns about reducing human oversight and potentially increasing bias in the model. OpenAI acknowledges these ethical considerations and suggests a combination of RBRs and human feedback. The company began exploring RBR methods while developing GPT-4 and has faced criticism over its commitment to safety.
Key takeaways:
- OpenAI has introduced a new method for aligning AI models with safety policies, known as Rules-Based Rewards (RBR), which automates some model fine-tuning and reduces the time required to ensure a model does not give unintended results.
- RBR allows safety and policy teams to use an AI model that scores responses based on how closely they adhere to a set of rules created by the teams.
- While RBR could reduce human oversight and potentially increase bias in the model, OpenAI believes it actually cuts down on subjectivity, an issue that human evaluators often face.
- OpenAI began exploring RBR methods while developing GPT-4 and has faced criticism about its commitment to safety, with key personnel leaving the company to focus on safe AI systems.