Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - confident-ai/deepeval: Evaluation and Unit Testing for LLMs

Sep 25, 2023 - github.com
DeepEval is a Pythonic tool designed to facilitate offline evaluations of Language Learning Model (LLM) pipelines, aiming to streamline the process of productionizing and evaluating LLMs. It offers features such as opinionated tests for answer relevancy, factual consistency, toxicness, and bias, a web UI to view tests, implementations, comparisons, and auto-evaluation through synthetic query-answer creation. The tool also integrates tightly with common frameworks such as Langchain and lLamaIndex and allows for the generation of synthetic queries for quick evaluation of queries related to prompts.

The motivation behind DeepEval is to simplify the testing process for LLM applications like Retrieval Augmented Generation (RAG) by making the process of writing tests as straightforward as authoring unit tests in Python. The tool aims to extend the familiar abstractions and tooling found in general software development to ML engineers, facilitating a rapid feedback loop for iterative improvements. DeepEval is built by the Confident AI Team and is designed to revolutionize how LLM tests are written, run, automated, and managed.

Key takeaways:

  • DeepEval is a Pythonic tool designed to run offline evaluations on Language Learning Model (LLM) pipelines, making the process of productionizing and evaluating LLMs as easy as ensuring all tests pass.
  • The tool provides features such as opinionated tests for answer relevancy, factual consistency, toxicness, bias, a Web UI to view tests, implementations, comparisons, and auto-evaluation through synthetic query-answer creation.
  • DeepEval integrates tightly with common frameworks such as Langchain and lLamaIndex, and allows for the generation of synthetic queries for quick evaluation of queries related to your prompts.
  • The motivation behind DeepEval is to streamline the testing process behind LLM applications, extending the familiar abstractions and tooling found in general software development to ML engineers to facilitate a more rapid feedback loop.
View Full Article

Comments (0)

Be the first to comment!