This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz

  • Time spent ~ 7 hours
  • Count of train dataset - 118k of audio samples from Mozilla Common Voice 17

Example of usage

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)
Downloads last month
82
Safetensors
Model size
809M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dvislobokov/whisper-large-v3-turbo-russian

Finetuned
(125)
this model

Dataset used to train dvislobokov/whisper-large-v3-turbo-russian

Space using dvislobokov/whisper-large-v3-turbo-russian 1