To address this, the authors have used the systematic nomenclature of Hugging Face LLMs to perform hierarchical clustering and identify communities amongst LLMs using n-grams and term frequency-inverse document frequency. They have developed a public web application, Constellation, which serves as an atlas of 15,821 LLMs. The application generates various visualizations like dendrograms, graphs, word clouds, and scatter plots, aiding in the exploration and understanding of LLMs.
Key takeaways:
- Large Language Models (LLMs) have become very prominent since late 2022, with hundreds of new LLMs being announced each week.
- There is a need to understand which LLM backbones, settings, training methods, and families are popular or trending, but there is no comprehensive index of LLMs available.
- The authors have developed Constellation, an atlas of 15,821 LLMs, using hierarchical clustering and identifying communities amongst LLMs using n-grams and term frequency-inverse document frequency.
- Constellation is a public web application that generates a variety of visualizations to help navigate and explore the LLMs, and is available at https://constellation.sites.stanford.edu/.