Real-time Speech Summarization for Medical Conversations
Description:
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum
Please cite this paper: https://arxiv.org/abs/2406.15888
@article{VietMed_Sum,
title={Real-time Speech Summarization for Medical Conversations},
author={Le-Duc, Khai and Nguyen, Khai-Nguyen and Vo-Dang, Long and Hy, Truong-Son},
journal={arXiv preprint arXiv:2406.15888},
booktitle={Interspeech 2024},
url = {https://arxiv.org/abs/2406.15888},
year={2024}
}
Model Card for Model ID
Model Details
Model Description
This model summarizes medical dialogues in Vietnamese. It can work in tandem with an ASR system to provide real-time dialogue summary.
- Developed by: Khai-Nguyen Nguyen
- Language(s) (NLP): Vietnamese
- Finetuned from model [optional]: ViT5
How to Get Started with the Model
Install the pre-requisite packages in Python.
pip install transformers
Use the code below to get started with the model.
from transformers import pipeline
# Initialize the pipeline with the ViT5 model, specify the device to use CUDA for GPU acceleration
pipe = pipeline("text2text-generation", model="monishsystem/medisum_vit5", device='cuda')
# Example text in Vietnamese describing a traditional medicine product
example = "Loại thuốc này chứa các thành phần đông y đặc biệt tốt cho sức khoẻ, giúp tăng cường sinh lý và bổ thận tráng dương, đặc biệt tốt cho người cao tuổi và người có bệnh lý nền"
# Generate a summary for the input text with a maximum length of 50 tokens
summary = pipe(example, max_new_tokens=50)
# Print the generated summary
print(summary)
- Downloads last month
- 9