OpenAI Wants AI to Help Humans Train AI

OpenAI has developed a new model, CriticGPT, which is designed to assist human trainers in assessing code. The model was developed by fine-tuning OpenAI's most powerful offering, GPT-4, and has proven to be effective in catching bugs that humans missed. The model's critiques of code were found to be better 63% of the time. OpenAI plans to extend this approach beyond code in the future, integrating it into their reinforcement learning with human feedback (RLHF) chat stack.

However, the approach is not without its flaws, as CriticGPT can also make mistakes. Despite this, the technique could help make OpenAI's models and tools like ChatGPT more accurate by reducing errors in human training. It could also prove crucial in helping AI models become smarter, potentially allowing humans to train an AI that exceeds their own abilities. The new technique is part of a broader effort to improve large language models and ensure that AI behaves in acceptable ways as it becomes more capable.

Key takeaways:

OpenAI has developed a new model, CriticGPT, to assist human trainers in assessing code, which has proven to be more effective in catching bugs and providing better critiques of code.
The company is looking to extend this approach beyond code and integrate it into their reinforcement learning with human feedback (RLHF) chat stack, which could make AI models and tools like ChatGPT more accurate by reducing errors in human training.
The new technique could also be crucial in helping AI models become smarter, potentially allowing humans to train an AI that exceeds their own abilities.
The development is part of a larger effort to improve large language models, ensure AI behaves in acceptable ways as it becomes more capable, and make their output more trustworthy and aligned with human values.

OpenAI Wants AI to Help Humans Train AI

Key takeaways:

Comments (0)

Newsletter