Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - lechmazur/divergent: LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each other or to 50 initial random words.

Dec 30, 2024 - github.com
The LLM Divergent Thinking Creativity Benchmark evaluates the originality and fluency of language models (LLMs) through a challenging variation of divergent thinking tests. In this benchmark, LLMs are tasked with generating 25 words that are distinct from each other and unrelated to an initial list of 50 random words, with the constraint that each word must start with a specified letter. The benchmark involves multiple LLMs, including GPT-4o, Claude 3.5 Sonnet, Grok 2, and Gemini 1.5 Pro, which are evaluated on their ability to generate distinct words and adhere to specific rules. The performance is measured by the average score of minimum divergences between generated words, with higher scores indicating better performance.

The results show varying performance among the models, with o1-preview achieving the highest score of 4.79 and GPT-4o scoring the lowest at 3.73. The percentage of repeated words is also analyzed, revealing that Llama 3.3 70B and o1-preview had no repeats, while GPT-4o had a high repeat rate of 23.68%. This repetition rate helps explain GPT-4o's lower performance. The benchmark provides insights into the creativity and distinctiveness capabilities of different LLMs, highlighting areas for improvement in generating unique and unrelated words.

Key takeaways:

  • The LLM Divergent Thinking Creativity Benchmark evaluates originality and fluency by having LLMs generate distinct words unrelated to an initial list.
  • Each LLM generates 2,200 words, evaluated by four LLMs for distinctiveness and adherence to rules.
  • Higher scores indicate better performance, with o1-preview achieving the highest score of 4.79.
  • GPT-4o performed poorly due to a high percentage of repeated words, at 23.68%.
View Full Article

Comments (0)

Be the first to comment!