Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

OpenAI trained o1 and o3 to 'think' about its safety policy | TechCrunch

Dec 22, 2024 - techcrunch.com
OpenAI has introduced a new family of AI reasoning models, o3, which are claimed to be more advanced than previous models like o1. These advancements are attributed to scaling test-time compute and a new safety paradigm called "deliberative alignment." This method involves the models re-prompting themselves with OpenAI's safety policy during the inference phase, allowing them to better align with safety principles and reduce the rate of unsafe responses. Deliberative alignment helps the models internally deliberate over how to answer questions safely, improving their ability to handle sensitive topics without over-refusal.

OpenAI's approach to aligning AI models includes using synthetic data generated by internal AI models for supervised fine-tuning and reinforcement learning, rather than relying on human-written examples. This method aims to efficiently teach models to reference safety policies without high latency or compute costs. The o3 model is expected to be released in 2025, and OpenAI believes that deliberative alignment could be crucial for ensuring AI models adhere to human values as they become more powerful and autonomous.

Key takeaways:

```html
  • OpenAI introduced a new family of AI reasoning models, o3, which are claimed to be more advanced than previous models, using a novel safety paradigm called "deliberative alignment" to improve alignment with human values.
  • Deliberative alignment involves the AI models re-prompting themselves with OpenAI's safety policy during inference, enhancing their ability to refuse unsafe prompts while answering benign ones.
  • OpenAI used synthetic data, generated by internal AI models, for post-training processes like supervised fine-tuning and reinforcement learning, aiming for a scalable approach to alignment without human-written data.
  • The o3 model is expected to be publicly available in 2025, and deliberative alignment is seen as a potential method to ensure AI models adhere to human values as they become more powerful.
```
View Full Article

Comments (0)

Be the first to comment!