Wishper Large V3 - Romanized Spoken Telugu
This model is a fine-tuned version of openai/whisper-large-v3 on the Telugu Romanized 1.0 dataset. It achieves the following results on the evaluation set:
- eval_loss: 1.5009
- eval_wer: 68.1275
- eval_runtime: 591.6137
- eval_samples_per_second: 0.798
- eval_steps_per_second: 0.1
- epoch: 8.6207
- step: 1000
Model description
The model is trained to transcipt Telugu conversations in Romanized script, that most people uses in day to day life.
Intended uses & limitations
Limitations: Sometimes, it translates the audio to english directly. Working on this to fix it.
Training and evaluation data
Gpt 4 api was used to convert google-fleurs
telugu labels to romanized script. I used english tokenizer, since the script is in english alphabet to train the model.
Usage
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "jayasuryajsk/whisper-large-v3-Telugu-Romanized"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
result = pipe("recording.mp3", generate_kwargs={"language": "english"})
print(result["text"])
Try this on https://colab.research.google.com/drive/1KxWSaxZThv8PE4mDoLfJv0O7L-5hQ1lE?usp=sharing
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 20
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 2000
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.40.1
- Pytorch 2.2.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for jayasuryajsk/whisper-large-v3-Telugu-Romanized
Base model
openai/whisper-large-v3