---
license: mit
language:
- ru
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- asr
- gigaam
- stt
- ru
- ctc
- audio
- speech
---

# GigaAM-v2-CTC Hugging Face transformers

* original git https://github.com/salute-developers/GigaAM

Russian ASR model GigaAM-v2-CTC.

## Model info
This is an original GigaAM-v2-CTC with `transformers` library interface.

File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-ctc/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods.

<!-- Jupyter `GigaAMHFTrain.ipynb` contains training pipeline with `transformers`. -->

## Installation
Install `GigaAM`:
```sh
git clone https://github.com/salute-developers/GigaAM.git
cd GigaAM
pip install -e .
```

my lib versions:
* `torch` 2.5.1
* `torchaudio` 2.5.1
* `transformers` 4.49.0

## Usage
Usage is same as other `transformers` ASR models.

```python
from gigaam_transformers import GigaAMCTCHF, GigaAMProcessor
import torch
import torchaudio

# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)

# load model and processor
processor = GigaAMProcessor.from_pretrained("waveletdeboshir/gigaam-ctc")
model = GigaAMCTCHF.from_pretrained("waveletdeboshir/gigaam-ctc")
model.eval()

input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")

# predict
with torch.no_grad():
    logits = model(**input_features).logits
# greedy decoding
greedy_ids = logits.argmax(dim=-1)
# decode token ids to text
transcription = processor.batch_decode(greedy_ids)[0]

```

## Fine tune
todo