license: apache-2.0 | |
datasets: | |
- ivrit-ai/crowd-transcribe-v5 | |
language: | |
- he | |
base_model: | |
- openai/whisper-large-v3-turbo | |
This is ivrit.ai's faster-whisper model, based on the ivrit-ai/whisper-large-v3-turbo Whisper model. | |
Training data includes 295 hours of volunteer-transcribed speech from the ivrit-ai/crowd-transcribe-v5 dataset, as well as 93 hours of professional transcribed speech from other sources. | |
Release date: TBD | |
# Prerequisites | |
pip3 install faster_whisper | |
# Usage | |
``` | |
import faster_whisper | |
model = faster_whisper.WhisperModel('ivrit-ai/whisper-large-v3-turbo-ct2') | |
segs, _ = model.transcribe('media-file', language='he') | |
texts = [s.text for s in segs] | |
transcribed_text = ' '.join(texts) | |
print(f'Transcribed text: {transcribed_text}') | |
``` |