arxiv:2505.12120

HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology

Published on May 17

· Submitted by

mgvz on May 20

Upvote

Authors:

Dmitry Nechaev ,

Alexey Pchelnikov ,

Abstract

Recent advancements in Digital Pathology (DP), particularly through artificial intelligence and Foundation Models, have underscored the importance of large-scale, diverse, and richly annotated datasets. Despite their critical role, publicly available Whole Slide Image (WSI) datasets often lack sufficient scale, tissue diversity, and comprehensive clinical metadata, limiting the robustness and generalizability of AI models. In response, we introduce the HISTAI dataset, a large, multimodal, open-access WSI collection comprising over 60,000 slides from various tissue types. Each case in the HISTAI dataset is accompanied by extensive clinical metadata, including diagnosis, demographic information, detailed pathological annotations, and standardized diagnostic coding. The dataset aims to fill gaps identified in existing resources, promoting innovation, reproducibility, and the development of clinically relevant computational pathology solutions. The dataset can be accessed at https://github.com/HistAI/HISTAI.

View arXiv page View PDF Add to collection

Community

mgvz

Paper author Paper submitter 4 days ago

New WSI dataset

librarian-bot

4 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.12120 in a model README.md to link it from this page.

Datasets citing this paper 9

Browse 9 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.12120 in a Space README.md to link it from this page.