--- license: mit language: - ru pipeline_tag: automatic-speech-recognition library_name: transformers tags: - asr - gigaam - stt - ru - ctc - audio - speech --- # GigaAM-v2-CTC Hugging Face transformers * original git https://github.com/salute-developers/GigaAM Russian ASR model GigaAM-v2-CTC. ## Model info This is an original GigaAM-v2-CTC with `transformers` library interface. File [`gigaam_transformers.py`](https://huggingface.co/waveletdeboshir/gigaam-ctc/blob/main/gigaam_transformers.py) contains model, feature extractor and tokenizer classes with usual transformers methods. ## Installation Install `GigaAM`: ```sh git clone https://github.com/salute-developers/GigaAM.git cd GigaAM pip install -e . ``` my lib versions: * `torch` 2.5.1 * `torchaudio` 2.5.1 * `transformers` 4.49.0 ## Usage Usage is same as other `transformers` ASR models. ```python from gigaam_transformers import GigaAMCTCHF, GigaAMProcessor import torch import torchaudio # load audio wav, sr = torchaudio.load("audio.wav") # resample if necessary wav = torchaudio.functional.resample(wav, sr, 16000) # load model and processor processor = GigaAMProcessor.from_pretrained("waveletdeboshir/gigaam-ctc") model = GigaAMCTCHF.from_pretrained("waveletdeboshir/gigaam-ctc") model.eval() input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt") # predict with torch.no_grad(): logits = model(**input_features).logits # greedy decoding greedy_ids = logits.argmax(dim=-1) # decode token ids to text transcription = processor.batch_decode(greedy_ids)[0] ``` ## Fine tune todo