waveletdeboshir commited on
Commit
0b6d81f
·
verified ·
1 Parent(s): 9daef93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ru
5
+ pipeline_tag: automatic-speech-recognition
6
+ library_name: transformers
7
+ tags:
8
+ - asr
9
+ - gigaam
10
+ - stt
11
+ - audio
12
+ - speech
13
+ - rnnt
14
+ - transducer
15
+ ---
16
+
17
+ # GigaAM-v2-CTC 🤗 Hugging Face transformers
18
+
19
+ * original git https://github.com/salute-developers/GigaAM
20
+
21
+ Russian ASR model GigaAM-v2-RNNT.
22
+
23
+ ## Model info
24
+ This is an original GigaAM-v2-RNNT with `transformers` library interface.
25
+
26
+ File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-rnnt/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below).
27
+
28
+ ## Installation
29
+
30
+ my lib versions:
31
+ * `torch` 2.5.1
32
+ * `torchaudio` 2.5.1
33
+ * `transformers` 4.49.0
34
+
35
+ ## Usage
36
+ Usage is same as other `transformers` ASR models.
37
+
38
+ ```python
39
+ from transformers import AutoModel, AutoProcessor
40
+ import torch
41
+ import torchaudio
42
+
43
+ # load audio
44
+ wav, sr = torchaudio.load("audio.wav")
45
+ # resample if necessary
46
+ wav = torchaudio.functional.resample(wav, sr, 16000)
47
+
48
+ # load model and processor
49
+ processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
50
+ model = AutoModel.from_pretrained("waveletdeboshir/gigaam-rnnt", trust_remote_code=True)
51
+ model.eval()
52
+
53
+ input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")
54
+
55
+ # greedy prediction
56
+ with torch.no_grad():
57
+ pred_ids = model.generate(**input_features)
58
+
59
+ # decode token ids to text
60
+ transcription = processor.batch_decode(pred_ids)[0]
61
+
62
+ ```
63
+
64
+ ## Fine-tune