Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Opening up ChatGPT: LLM openness leaderboard

Jun 19, 2024 - news.bensbites.com
The paper by Liesenfeld and Dingemanse, presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24), critically examines the openness of instruction-tuned text generators, often marketed as 'open source'. The authors argue that open research is crucial for progress in science and engineering, and the proprietary nature of some models, like ChatGPT, makes them unsuitable for responsible use in research and education. They provide a comprehensive table evaluating over 40 ChatGPT alternatives based on their openness, development, and documentation.

The paper highlights several recurring patterns, such as the use of data of dubious legality, the rarity of shared instruction-tuning, and the scarcity of peer-reviewed papers. The authors conclude that while openness is not a complete solution to the scientific and ethical challenges of conversational text generators, it enables original research, reproducible workflows, and fosters a culture of accountability. They hope their work will contribute to this direction.

Key takeaways:

  • The paper discusses the importance of openness in research, particularly in the field of instruction-tuned language model architectures. It argues that proprietary models like ChatGPT are unfit for responsible use in research and education due to their closed nature.
  • The authors have created a table that tracks the openness, transparency, and accountability of over 40 alternatives to ChatGPT. The table provides a three-level openness judgement for each project and is sorted by cumulative openness.
  • Recurrent patterns in the data show that many projects use data of dubious legality, few share instruction-tuning data, preprints are rare and peer-reviewed papers even rarer, and the use of synthetic instruction-tuning data is increasing, with unknown consequences.
  • While openness is not a complete solution to the scientific and ethical challenges of conversational text generators, it does enable original research, reproducible workflows, and a culture of accountability for data and its curation, and for models and their deployment.
View Full Article

Comments (0)

Be the first to comment!