Alongside the research update, OpenAI announced a new $10 million fund to support people working on superalignment. The company will offer grants of up to $2 million to university labs, nonprofits, and individual researchers, and one-year fellowships of $150,000 to graduate students. The move comes after a turbulent period for OpenAI, which saw CEO Sam Altman fired and then reinstated by its oversight board.
Key takeaways:
- OpenAI's superalignment team is working on techniques to prevent a hypothetical future superintelligence from going rogue. They are exploring how a less powerful model can supervise a more powerful one, which could be a step towards humans supervising superhuman machines.
- The team is focusing on aligning superhuman models, ensuring they do what humans want and avoid undesirable actions. They use reinforcement learning via human feedback, but this approach has limitations when dealing with superhuman models that might perform actions beyond human understanding.
- In their research, the team used GPT-2 to supervise GPT-4, OpenAI’s latest and most powerful model. The results were mixed, with GPT-4 outperforming GPT-2 in language tasks but struggling with chess puzzles. The team concluded that the approach is promising but requires further development.
- OpenAI is encouraging more research into superalignment, announcing a new $10 million fund to support university labs, nonprofits, individual researchers, and graduate students working on this issue.