BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP Paper โข 2506.10896 โข Published 1 day ago โข 1 โข 2
BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP Paper โข 2506.10896 โข Published 1 day ago โข 1
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper โข 2506.10952 โข Published 1 day ago โข 20
Institutional Books Collection A growing corpus of public domain books from library collections, seeded by Harvard Library. โข 3 items โข Updated 2 days ago โข 1
institutional/institutional-books-topic-classifier-bert Text Classification โข Updated 2 days ago โข 15 โข 4
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 4 days ago โข 6 โข 3
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 4 days ago โข 6
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 4 days ago โข 6 โข 3