The authors have used this annotated data to train and evaluate case outcome extraction with GPT-3, GPT-4 and RoBERTa models, providing benchmarks for future research. The article also includes a detailed legal and ethical discussion due to the sensitive nature of the material. The CLC will only be released for research purposes under specific restrictions.
Key takeaways:
- The Cambridge Law Corpus (CLC) is introduced as a corpus for legal AI research, consisting of over 250,000 UK court cases, some dating back to the 16th century.
- The first release of the corpus includes raw text and meta-data, as well as annotations on case outcomes for 638 cases, done by legal experts.
- Case outcome extraction has been trained and evaluated using GPT-3, GPT-4 and RoBERTa models, providing benchmarks for future research.
- Due to the sensitive nature of the material, the corpus will only be released for research purposes under certain restrictions, accompanied by an extensive legal and ethical discussion.