The paper highlights several recurring patterns, such as the use of data of dubious legality, the rarity of shared instruction-tuning, and the scarcity of peer-reviewed papers. The authors conclude that while openness is not a complete solution to the scientific and ethical challenges of conversational text generators, it enables original research, reproducible workflows, and fosters a culture of accountability. They hope their work will contribute to this direction.
Key takeaways:
- The paper discusses the importance of openness in research, particularly in the field of instruction-tuned language model architectures. It argues that proprietary models like ChatGPT are unfit for responsible use in research and education due to their closed nature.
- The authors have created a table that tracks the openness, transparency, and accountability of over 40 alternatives to ChatGPT. The table provides a three-level openness judgement for each project and is sorted by cumulative openness.
- Recurrent patterns in the data show that many projects use data of dubious legality, few share instruction-tuning data, preprints are rare and peer-reviewed papers even rarer, and the use of synthetic instruction-tuning data is increasing, with unknown consequences.
- While openness is not a complete solution to the scientific and ethical challenges of conversational text generators, it does enable original research, reproducible workflows, and a culture of accountability for data and its curation, and for models and their deployment.