gigaam-rnnt / README.md
waveletdeboshir's picture
Update README.md
0b6d81f verified
metadata
license: mit
language:
  - ru
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
  - asr
  - gigaam
  - stt
  - audio
  - speech
  - rnnt
  - transducer

GigaAM-v2-CTC 🤗 Hugging Face transformers

Russian ASR model GigaAM-v2-RNNT.

Model info

This is an original GigaAM-v2-RNNT with transformers library interface.

File gigaam_transformers.py contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below).

Installation

my lib versions:

  • torch 2.5.1
  • torchaudio 2.5.1
  • transformers 4.49.0

Usage

Usage is same as other transformers ASR models.

from transformers import AutoModel, AutoProcessor
import torch
import torchaudio

# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)

# load model and processor
processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
model = AutoModel.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
model.eval()

input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")

# greedy prediction
with torch.no_grad():
    pred_ids = model.generate(**input_features)

# decode token ids to text
transcription = processor.batch_decode(pred_ids)[0]

Fine-tune