Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Asking 60+ LLMs a set of 20 questions

Sep 09, 2023 - news.bensbites.co
The author has developed a script to test the performance of around 60 AI models using basic reasoning, instruction following, and creativity prompts. The script stores all the responses in a SQLite database, providing raw results for each model. The prompts cover a range of topics, including reflection, knowledge, coding, instruction, and creativity.

The author acknowledges that the current system is imperfect and aims to improve it by using better stop sequences and prompt formatting tailored to each model. Future ideas include public votes to compute an ELO rating, comparing two models side by side, and community-submitted prompts. The author is open to suggestions and feedback.

Key takeaways:

  • The author created a script to test around 60 AI models on their basic reasoning, instruction following, and creativity skills.
  • The script stored all the responses in a SQLite database, providing raw results for comparison.
  • The author used a mix of APIs from OpenRouter, TogetherAI, OpenAI, Cohere, Aleph Alpha & AI21 for the testing.
  • The author plans to improve the testing process by using better stop sequences and prompt formatting tailored to each model.
View Full Article

Comments (0)

Be the first to comment!