WhisperConformer

WhisperConformer is a modified ASR model from Whisper and Conformer architecture to capture global context and local feature extraction to improve recognition accuracy. The model is trained on 387 hours of Thai speech data.

This Model Can be fine-tuning based on this Fine-Tune Whisper with ๐Ÿค— Transformers

Usage

pip install --upgrade pip
pip install WhisperConformer

The model can be used with the pipeline class to transcribe audios of arbitrary length:

from transformers import pipeline
from transformers import WhisperTokenizer,WhisperFeatureExtractor
from WhisperConformer import WhisperConformerModel

model_name = "Thanakron/whisperConformer-medium-th"
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name)
model = WhisperConformerModel.from_pretrained(model_name)
pipe = pipeline(task="automatic-speech-recognition",model=model,tokenizer=tokenizer,feature_extractor=feature_extractor) 

input = "audio.wav"
def transcribe(audio):
    text = pipe(audio)["text"]
    return text

print(transcribe(input))
Downloads last month
3
Safetensors
Model size
740M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Thanakron/whisperConformer-medium-th

Finetuned
(585)
this model