File size: 2,028 Bytes
0623c64 928ce1a ea0df66 f6d2fc1 0623c64 ea0df66 e5e2e3b 4be2a7b ea0df66 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: apache-2.0
language:
- th
base_model: biodatlab/whisper-th-medium-combined
tags:
- whisper
- Pytorch
---
# Whisper-th-medium-ct2
whisper-th-medium-ct2 is the CTranslate2 format of [biodatlab/whisper-th-medium-combined](https://huggingface.co/biodatlab/whisper-th-medium-combined), comparable with [WhisperX](https://github.com/m-bain/whisperX) and [faster-whisper](https://github.com/SYSTRAN/faster-whisper), which enables:
- 🤏 **Half the size** of original Huggingface format.
- ⚡️ Batched inference for **70x** real-time transcription.
- 🪶 A faster-whisper backend, requiring **<8GB GPU memory** with beam_size=5.
- 🎯 Accurate word-level timestamps using wav2vec2 alignment.
- 👯♂️ Multispeaker ASR using speaker diarization(includes speaker ID labels).
- 🗣️ VAD preprocessing, reducing hallucinations and allowing batching with no WER degradation.
### Usage
```python
!pip install git+https://github.com/m-bain/whisperx.git
import whisperx
import time
# Setting
device = "cuda"
audio_file = "audio.mp3"
batch_size = 16
compute_type = "float16"
"""
Your Hugging Face token for the Diarization model is required.
Additionally, you need to accept the terms and conditions before use.
Please visit the model page here.
https://huggingface.co/pyannote/segmentation-3.0
"""
HF_TOKEN = ""
# load model and transcript
model = whisperx.load_model("Thaweewat/whisper-th-medium-ct2", device, compute_type=compute_type)
st_time = time.time()
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
# Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
diarize_segments = diarize_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)
# Combine pure text if needed
combined_text = ' '.join(segment['text'] for segment in result['segments'])
print(f"Response time: {time.time() - st_time} seconds")
print(diarize_segments)
print(result)
print(combined_text)
``` |