The article also provides a detailed case analysis of two questions to illustrate the performance of different AI search engines. In both cases, Perplexity Pro provided accurate and correct answers, while other platforms were either inconsistent or incorrect. The primary reason for the inaccuracies was the failure to recall the correct content. The article concludes that there is room for improvement in the quality of large language models (LLMs) used by various AI search engines.
Key takeaways:
- Perplexity Pro significantly outperformed other AI search engines in the evaluation, achieving an accuracy rate of 80%.
- Large Language Models (LLMs) tend to infer when the source retrieved is not enough, leading to lots of hallucination.
- The LLMs generating answers for Metaso and Perplexity (basic) performed poorly, often providing incorrect answers even when relevant information was available.
- The evaluation focused on complex problems involving multiple points of information, with answers that require consolidation or reasoning.