|
--- |
|
library_name: transformers |
|
datasets: |
|
- leduckhai/VietMed-Sum |
|
language: |
|
- vi |
|
pipeline_tag: summarization |
|
--- |
|
|
|
# Real-time Speech Summarization for Medical Conversations |
|
|
|
**<div align="center">Interspeech 2024 (Oral)</div>** |
|
|
|
<div align="center">Khai Le-Duc*, Khai-Nguyen Nguyen*, Long Vo-Dang, Truong-Son Hy</div> |
|
|
|
<div align="center">*Equal contribution</div> |
|
|
|
<p align="center"> |
|
<img src="RTSS_diagram.png" alt="drawing" width="900"/> |
|
</p> |
|
|
|
## Description: |
|
|
|
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. |
|
All code, data (English-translated and Vietnamese) and models are available online: [https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum](https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum) |
|
|
|
|
|
Please cite this paper: https://arxiv.org/abs/2406.15888 |
|
|
|
@article{VietMed_Sum, |
|
title={Real-time Speech Summarization for Medical Conversations}, |
|
author={Le-Duc, Khai and Nguyen, Khai-Nguyen and Vo-Dang, Long and Hy, Truong-Son}, |
|
journal={arXiv preprint arXiv:2406.15888}, |
|
booktitle={Interspeech 2024}, |
|
url = {https://arxiv.org/abs/2406.15888}, |
|
year={2024} |
|
} |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
This model summarizes medical dialogues in Vietnamese. It can work in tandem with an ASR system to provide real-time dialogue summary. |
|
|
|
- **Developed by:** Khai-Nguyen Nguyen |
|
- **Language(s) (NLP):** Vietnamese |
|
- **Finetuned from model [optional]:** ViT5 |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Install the pre-requisite packages in Python. |
|
```python |
|
pip install transformers |
|
``` |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Initialize the pipeline with the ViT5 model, specify the device to use CUDA for GPU acceleration |
|
pipe = pipeline("text2text-generation", model="monishsystem/medisum_vit5", device='cuda') |
|
|
|
# Example text in Vietnamese describing a traditional medicine product |
|
example = "Loại thuốc này chứa các thành phần đông y đặc biệt tốt cho sức khoẻ, giúp tăng cường sinh lý và bổ thận tráng dương, đặc biệt tốt cho người cao tuổi và người có bệnh lý nền" |
|
|
|
# Generate a summary for the input text with a maximum length of 50 tokens |
|
summary = pipe(example, max_new_tokens=50) |
|
|
|
# Print the generated summary |
|
print(summary) |
|
``` |