Giskard’s open-source framework evaluates AI models before they’re pushed into production

Giskard, a French startup, is developing an open-source testing framework for large language models (LLMs). The framework can alert developers about biases, security risks, and the potential for harmful or toxic content generation. The company's framework includes an open-source Python library, a test suite for models, and a real-time monitoring tool called LLMon. The tests cover a range of issues including performance, misinformation, biases, and harmful content generation. The startup is also working on a premium AI quality hub to help debug and compare models.

The company is positioning itself to help AI developers comply with upcoming regulations like the EU's AI Act. The Act will require companies to prove their AI models comply with certain rules and mitigate risks. Giskard's tools can be integrated into the continuous integration and continuous delivery (CI/CD) pipeline, allowing for regular testing and immediate feedback on any issues. The company is already selling its AI Quality Hub to companies like the Banque de France and L’Oréal, and plans to double its team size to become the leading LLM antivirus on the market.

Key takeaways:

Giskard is a French startup that has developed an open-source testing framework for large language models (LLMs), which can alert developers to potential biases, security risks, and the ability of a model to generate harmful content.
The company's testing framework includes an open-source Python library, a test suite for regular use on models, and a real-time monitoring tool called LLMon.
Giskard's second product, the AI Quality Hub, helps debug LLMs and compare them to other models. The company is also working on generating documentation to prove a model's compliance with regulations.
With the impending enforcement of the AI Act in the EU, Giskard is positioning itself as a tool for developers to ensure their AI models comply with regulations and avoid potential fines.

Giskard’s open-source framework evaluates AI models before they’re pushed into production | TechCrunch

Key takeaways:

Comments (0)

Newsletter