Now we know what OpenAI’s superalignment team has been up to

OpenAI's superalignment team has released a research paper detailing a technique that allows a less powerful large language model to supervise a more powerful one. This is seen as a small step towards figuring out how humans might supervise superhuman machines. The team used GPT-2 to supervise GPT-4, training it to perform various tasks, with mixed results. The team found that GPT-4 trained by GPT-2 performed 20% to 70% better than GPT-2 on language tasks but did less well on chess puzzles.

Alongside the research update, OpenAI announced a new $10 million fund to support people working on superalignment. The company will offer grants of up to $2 million to university labs, nonprofits, and individual researchers, and one-year fellowships of $150,000 to graduate students. The move comes after a turbulent period for OpenAI, which saw CEO Sam Altman fired and then reinstated by its oversight board.

Key takeaways:

OpenAI's superalignment team is working on techniques to prevent a hypothetical future superintelligence from going rogue. They are exploring how a less powerful model can supervise a more powerful one, which could be a step towards humans supervising superhuman machines.
The team is focusing on aligning superhuman models, ensuring they do what humans want and avoid undesirable actions. They use reinforcement learning via human feedback, but this approach has limitations when dealing with superhuman models that might perform actions beyond human understanding.
In their research, the team used GPT-2 to supervise GPT-4, OpenAI’s latest and most powerful model. The results were mixed, with GPT-4 outperforming GPT-2 in language tasks but struggling with chess puzzles. The team concluded that the approach is promising but requires further development.
OpenAI is encouraging more research into superalignment, announcing a new $10 million fund to support university labs, nonprofits, individual researchers, and graduate students working on this issue.

Now we know what OpenAI’s superalignment team has been up to

Key takeaways:

Comments (0)

Newsletter