YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

Description:

Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, a collection of small-to-large end-to-end ASR models for the medical domain, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese, together with the corresponding real-world ASR dataset. To our best knowledge, MultiMed stands as the largest and the first multilingual medical ASR dataset, in terms of total duration, number of speakers, diversity of diseases, recording conditions, speaker roles, unique medical terms, accents, and ICD-10 codes.

Please cite this paper: https://arxiv.org/abs/2409.14074

@inproceedings{le2024multimed,
  title={MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder},
  author={Le-Duc, Khai and Phan, Phuc and Pham, Tan-Hanh and Tat, Bach Phan and Ngo, Minh-Huong and Hy, Truong-Son},
  journal={arXiv preprint arXiv:2409.14074},
  year={2024}
}

To load labeled data, please refer to our HuggingFace, Paperswithcodes.

Contact:

If any links are broken, please contact me for fixing!

Le Duc Khai
University of Toronto, Canada
Email: [email protected]
GitHub: https://github.com/leduckhai
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support