Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Researchers Propose Approach for Multilingual Dialogue Evaluation Metrics - SuperAGI News

Sep 01, 2023 - news.bensbites.co
Researchers from various universities have developed a new method to address the lack of multilingual data in open-sourced multilingual dialogue systems. They used a multilingual pretrained encoder-based Language Model and Machine Translation (MT) to augment English dialogue data. The study found that simply fine-tuning a pretrained multilingual encoder model with translated data does not outperform the existing baseline. Instead, using MT Quality Estimation (QE) metrics to curate translated data and exclude low-quality translations proved more effective.

The study also proposed using QE scores for response ranking for each target language to address the noise introduced by low-quality translations. This resulted in fine-tuned multilingual dialogue evaluation models that strongly correlated with human judgments. The study suggests that future research could involve evaluating generative model responses in different languages using annotators exposed to the culture associated with a given language.

Key takeaways:

  • Researchers have developed a new approach to address the lack of multilingual data and open-sourced multilingual dialogue systems, using a multilingual pretrained encoder-based Language Model and Machine Translation (MT).
  • Simply finetuning a pretrained multilingual encoder model with translated data does not outperform the existing baseline. Instead, using MT Quality Estimation (QE) metrics to curate translated data is more effective.
  • The authors proposed using QE scores for response ranking for each target language, which provides a standardized method for filtering and improving the method’s scalability to new languages.
  • The study suggests that filtering out low-quality translations can reduce the performance gap on ChatGPT and outperform it on select correlation metrics. Future research could involve evaluating generative model responses in different languages using annotators exposed to the culture associated with a given language.
View Full Article

Comments (0)

Be the first to comment!