1
Feature Story
Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps
Feb 20, 2025 · news.ycombinator.comDespite its capabilities, DeepEval's primary evaluation method, LLM-as-a-judge, faces consistency challenges. To address this, Confident AI introduced a DAG metric, a decision-tree-based approach that provides deterministic results by breaking test cases into atomic units. This metric is particularly effective in scenarios with clearly defined success criteria, like text summarization. Although still in its early stages, the DAG metric aims to offer reliable, code-driven, open-source metrics for LLM benchmarking. Confident AI is available on a freemium tier, with a temporary waiver on the requirement for a work email signup.
Key takeaways
- Confident AI is a cloud platform built around DeepEval, an open-source package for evaluating and unit-testing LLM applications.
- The platform includes features like a dataset editor, regression catcher, and iteration insights to enhance LLM evaluation and benchmarking.
- Confident AI aims to provide reliable benchmarking by using a new DAG metric for deterministic results, despite current limitations in evaluation methods.
- The platform is available on a freemium tier, with a temporary option to sign up without a work email.