WhisperConformer

WhisperConformer is a modified ASR model from Whisper and Conformer architecture to capture global context and local feature extraction to improve recognition accuracy. The model is trained on 387 hours of Thai speech data.

This Model Can be fine-tuning based on this Fine-Tune Whisper with 🤗 Transformers

Usage

pip install --upgrade pip
pip install WhisperConformer

The model can be used with the pipeline class to transcribe audios of arbitrary length:

from transformers import pipeline
from transformers import WhisperTokenizer,WhisperFeatureExtractor
from WhisperConformer import WhisperConformerModel

model_name = "Thanakron/whisperConformer-medium-th"
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name)
model = WhisperConformerModel.from_pretrained(model_name)
pipe = pipeline(task="automatic-speech-recognition",model=model,tokenizer=tokenizer,feature_extractor=feature_extractor) 

input = "audio.wav"
def transcribe(audio):
    text = pipe(audio)["text"]
    return text

print(transcribe(input))

Thanakron
/

whisperConformer-medium-th

WhisperConformer

Usage

Model tree for Thanakron/whisperConformer-medium-th