Specimen Label Transcription Project

university

AI & ML interests

This is a repository for open-source datasets and models for natural history collection label transcription using LLMs, hosted by the University of Michigan Herbarium. Partner institutions will periodically add new training datasets (OCR and human-transcribed datasets) and benchmarking datasets used to rank model/method performance. For code to create benchmark datasets or analyze model performance, please visit the GitHub repo. To join the SLTP initiative, please email [email protected].

Recent Activity

phyloforfun  updated a dataset over 1 year ago
SLTP/HLT-AA-C21-Alpaca
View all activity