File size: 1,561 Bytes
0b6d81f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: mit
language:
- ru
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- asr
- gigaam
- stt
- audio
- speech
- rnnt
- transducer
---

# GigaAM-v2-CTC 🤗 Hugging Face transformers

* original git https://github.com/salute-developers/GigaAM

Russian ASR model GigaAM-v2-RNNT.

## Model info
This is an original GigaAM-v2-RNNT with `transformers` library interface.

File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-rnnt/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below).

## Installation

my lib versions:
* `torch` 2.5.1
* `torchaudio` 2.5.1
* `transformers` 4.49.0

## Usage
Usage is same as other `transformers` ASR models.

```python
from transformers import AutoModel, AutoProcessor
import torch
import torchaudio

# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)

# load model and processor
processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
model = AutoModel.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
model.eval()

input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")

# greedy prediction
with torch.no_grad():
    pred_ids = model.generate(**input_features)

# decode token ids to text
transcription = processor.batch_decode(pred_ids)[0]

```

## Fine-tune