The company advises against using SeamlessM4T for long-form, certified, medical, or legal translations due to potential inaccuracies. Previous instances of AI mistranslations have led to legal issues. Despite these concerns, Meta believes the model is a significant step towards creating universal multitask systems and plans to explore how it can enable new communication capabilities in the future.
Key takeaways:
- Meta has developed an AI model, SeamlessM4T, capable of translating and transcribing nearly 100 languages across text and speech. The model is available in open source along with a new translation data set, SeamlessAlign.
- SeamlessM4T was trained using publicly available text and speech data from the web, but Meta has not revealed the exact sources of the data. The company claims that the data was not copyrighted and came primarily from open source or licensed sources.
- Despite its advanced capabilities, SeamlessM4T has shown biases, such as overgeneralizing to masculine forms when translating from neutral terms. It also tends to produce more toxic translations in certain languages and topics, such as socioeconomic status, culture, sexual orientation, and religion.
- Meta advises against using SeamlessM4T for long-form translation, certified translations, and for medical or legal purposes due to potential inaccuracies and loss of lexical richness in translations.