Institutional Books Collection A growing corpus of public domain books from library collections, seeded by Harvard Library. • 10 items • Updated 8 days ago • 8
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published Jun 10, 2025 • 11
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62