The collection was sourced from Duxiu, a massive database of scanned books created by the SuperStar Digital Library Group, and has been shared with Anna's Archive by a volunteer. The collection, which is difficult to obtain in bulk, is larger than Library Genesis non-fiction and totals about 359TB in its current form. Anna's Archive is open to other proposals and ideas regarding this collection.
Key takeaways:
- Anna’s Archive has acquired a unique collection of 7.5 million / 350TB Chinese non-fiction books, which is larger than Library Genesis.
- The Archive is willing to give an LLM company exclusive early access to this collection for 1 year in exchange for high-quality OCR and text extraction.
- The collection was obtained from Duxiu, a massive database of scanned books created by the SuperStar Digital Library Group, and has been shared with Anna's Archive by a volunteer for long-term preservation.
- Anna’s Archive is open to other proposals and ideas for collaboration and encourages interested parties to contact them.