Can ChatGPT become a content moderator?

OpenAI has developed a new method for its AI chatbot, ChatGPT, to self-moderate its content, reducing the need for human intervention. The move comes after the company faced criticism for its previous moderation methods, which included paying workers in Kenya to label offensive content, and incidents where users were able to bypass the chatbot's moderation policies. The new method will still involve humans to craft and update policies and check edge cases, but it is expected to significantly reduce the number of people required for this work.

The self-moderation capability emerged in GPT-4, the previous version of the large language model, and was tested at the DEF CON security conference. OpenAI acknowledges that the system won't be perfect but is confident in its effectiveness. The company's approach to moderation differs from competitors like Anthropic, which uses a "Constitutional AI" system where the AI is instilled with certain values that guide its operation and content creation.

Key takeaways:

OpenAI has been working on improving content moderation for its AI chatbot, ChatGPT, after users were able to bypass its moderation policies and elicit inappropriate responses.
The company has faced criticism for its methods of blocking offensive content, which involved outsourcing to workers in Kenya who were exposed to disturbing content.
The latest version of its large language model, GPT-4, has the capability to moderate itself, which is expected to drastically reduce the number of human moderators needed.
The self-moderation system was tested at the DEF CON security conference, and while it's not perfect, OpenAI is confident in its effectiveness and plans to continue improving it based on feedback.

Can ChatGPT become a content moderator? | Semafor

Key takeaways:

Comments (0)

Newsletter