Continuous-eval also features ensemble metrics that predict user feedback, providing developers with a feedback loop from production data to offline testing and development. The founders also emphasize the importance of using a diverse dataset for evaluation, and offer a synthetic data generation pipeline to help users get started quickly. They are seeking feedback on their modular framework, user feedback leveraging, and testing with synthetic data.
Key takeaways:
- Relari has developed continuous-eval, an evaluation framework that allows for testing of GenAI systems at the component level, making it easier to identify and address issues.
- Continuous-eval allows users to programmatically describe their pipeline and modules, and select metrics for each module, with 30+ metrics developed to cover various aspects of GenAI pipelines.
- Relari's system also includes ensemble metrics that predict user feedback, providing developers with a feedback loop from production data to offline testing and development.
- Relari also offers a synthetic data generation pipeline to help users get started quickly and make the most out of evaluation, emphasizing the importance of using a diverse dataset for comprehensive and consistent assessment.