--- license: mit language: - ru pipeline_tag: automatic-speech-recognition library_name: transformers tags: - asr - gigaam - stt - audio - speech - rnnt - transducer --- # GigaAM-v2-CTC 🤗 Hugging Face transformers * original git https://github.com/salute-developers/GigaAM Russian ASR model GigaAM-v2-RNNT. ## Model info This is an original GigaAM-v2-RNNT with `transformers` library interface. File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-rnnt/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below). ## Installation my lib versions: * `torch` 2.5.1 * `torchaudio` 2.5.1 * `transformers` 4.49.0 ## Usage Usage is same as other `transformers` ASR models. ```python from transformers import AutoModel, AutoProcessor import torch import torchaudio # load audio wav, sr = torchaudio.load("audio.wav") # resample if necessary wav = torchaudio.functional.resample(wav, sr, 16000) # load model and processor processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True) model = AutoModel.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True) model.eval() input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt") # greedy prediction with torch.no_grad(): pred_ids = model.generate(**input_features) # decode token ids to text transcription = processor.batch_decode(pred_ids)[0] ``` ## Fine-tune