--- license: mit language: - ru pipeline_tag: automatic-speech-recognition library_name: transformers tags: - asr - gigaam - stt - ru - ctc - audio - speech --- [![Finetune In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/waveletdeboshir/c01334561f23c5167598b2054e50839a/gigaam-ctc-hf-finetune.ipynb) # GigaAM-v2-CTC 🤗 Hugging Face transformers * original git https://github.com/salute-developers/GigaAM Russian ASR model GigaAM-v2-CTC. ## Model info This is an original GigaAM-v2-CTC with `transformers` library interface. File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-ctc/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below). ## Installation my lib versions: * `torch` 2.5.1 * `torchaudio` 2.5.1 * `transformers` 4.49.0 ## Usage Usage is same as other `transformers` ASR models. ```python from transformers import AutoModel, AutoProcessor import torch import torchaudio # load audio wav, sr = torchaudio.load("audio.wav") # resample if necessary wav = torchaudio.functional.resample(wav, sr, 16000) # load model and processor processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-ctc", trust_remote_code=True) model = AutoModel.from_pretrained("waveletdeboshir/gigaam-ctc", trust_remote_code=True) model.eval() input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt") # predict with torch.no_grad(): logits = model(**input_features).logits # greedy decoding greedy_ids = logits.argmax(dim=-1) # decode token ids to text transcription = processor.batch_decode(greedy_ids)[0] ``` ## Fine-tune [![Finetune In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/waveletdeboshir/c01334561f23c5167598b2054e50839a/gigaam-ctc-hf-finetune.ipynb) [Fine-tuning Jupyter](https://gist.github.com/waveletdeboshir/c01334561f23c5167598b2054e50839a)