GitHub - crizCraig/evals: Run safety evals across providers (OpenAI, Anthropic, etc...)

The markdown data pertains to the safety evaluations of LLM (Language Learning Models) and the results are now hosted at Evals.gg. The latest update was made on April 28, 2024, and an image of a bar chart is provided for visual representation. There is also a mention of an 'X post', but the context is not clear.

The latter part of the data provides instructions on setting up and running a Python environment using conda, specifically for the 'evals' project. It also includes steps for running redis for temporary caching, which helps in re-running fetch code without re-fetching identical prompts. Lastly, it provides a command to fetch the latest results for all models.

Key takeaways:

The safety evaluation results for LLM are now hosted at Evals.gg.
The results include a bar chart for the evaluation conducted on April 28, 2024.
The setup process involves creating a new environment with Python 3.12 using conda and activating it.
Redis is used for temporary caching to avoid re-fetching identical prompts and the fetch code can be rerun without re-fetching.

GitHub - crizCraig/evals: Run safety evals across providers (OpenAI, Anthropic, etc...)

Key takeaways:

Comments (0)

Newsletter