AI Search Engine Multilingual Evaluation Report - Complex Query (v1.1)

The article discusses an assessment of AI search engines' proficiency in resolving complex problems. The evaluation found that the Basic versions of various products were not entirely satisfactory, leading to the inclusion of Perplexity Pro in the testing scope. After rigorous testing, it was found that Perplexity Pro significantly outperformed, achieving an accuracy rate of 80%, while other products fell short. The test cases were primarily constructed around comparative and composite questions.

The article also provides a detailed case analysis of two questions to illustrate the performance of different AI search engines. In both cases, Perplexity Pro provided accurate and correct answers, while other platforms were either inconsistent or incorrect. The primary reason for the inaccuracies was the failure to recall the correct content. The article concludes that there is room for improvement in the quality of large language models (LLMs) used by various AI search engines.

Key takeaways:

Perplexity Pro significantly outperformed other AI search engines in the evaluation, achieving an accuracy rate of 80%.
Large Language Models (LLMs) tend to infer when the source retrieved is not enough, leading to lots of hallucination.
The LLMs generating answers for Metaso and Perplexity (basic) performed poorly, often providing incorrect answers even when relevant information was available.
The evaluation focused on complex problems involving multiple points of information, with answers that require consolidation or reasoning.

AI Search Engine Multilingual Evaluation Report - Complex Query (v1.1) | Glarity

Key takeaways:

Comments (0)

Newsletter