However, Mistral acknowledges that AI-based moderation systems, while promising, can also be prone to biases and technical flaws. The company claims its model is highly accurate but admits it is still a work in progress. Mistral did not compare its API's performance with other popular moderation APIs, such as Jigsaw’s Perspective API and OpenAI’s moderation API. The company plans to continue working with customers and the research community to improve its moderation tooling and contribute to safety advancements in the field.
Key takeaways:
- Mistral, an AI startup, has launched a new API for content moderation, which is the same API used in its Le Chat chatbot platform.
- The API uses a model trained to classify text into nine categories, including sexual, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health, financial, law, and personally identifiable information.
- Despite the potential benefits of AI-powered moderation systems, they can also be susceptible to biases and technical flaws, such as misinterpreting certain phrases or languages as "toxic".
- Mistral acknowledges that its moderation model is a work in progress and is actively working with customers and the research community to improve its tooling and contribute to safety advancements in the field.