Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Paper page - On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Jul 20, 2023 - huggingface.co
The article discusses the rise of Large Language Models (LLMs) since 2022, with platforms like ChatGPT and Bard gaining millions of users. With hundreds of new LLMs being announced weekly and nearly 16,000 Text Generation models uploaded to Hugging Face, a machine learning models and datasets repository, there is a need to understand popular or trending LLM backbones, settings, training methods, and families. However, a comprehensive index of LLMs is currently unavailable.

To address this, the authors have used the systematic nomenclature of Hugging Face LLMs to perform hierarchical clustering and identify communities amongst LLMs using n-grams and term frequency-inverse document frequency. They have developed a public web application, Constellation, which serves as an atlas of 15,821 LLMs. The application generates various visualizations like dendrograms, graphs, word clouds, and scatter plots, aiding in the exploration and understanding of LLMs.

Key takeaways:

  • Large Language Models (LLMs) have become very prominent since late 2022, with hundreds of new LLMs being announced each week.
  • There is a need to understand which LLM backbones, settings, training methods, and families are popular or trending, but there is no comprehensive index of LLMs available.
  • The authors have developed Constellation, an atlas of 15,821 LLMs, using hierarchical clustering and identifying communities amongst LLMs using n-grams and term frequency-inverse document frequency.
  • Constellation is a public web application that generates a variety of visualizations to help navigate and explore the LLMs, and is available at https://constellation.sites.stanford.edu/.
View Full Article

Comments (0)

Be the first to comment!