--- license: mit language: - ru pipeline_tag: automatic-speech-recognition library_name: transformers tags: - asr --- # GigaAMv2-CTC Hugging Face transformers * original git https://github.com/salute-developers/GigaAM Russian ASR model ## Model info This is an original GigaAMv2-CTC with `transformers` library interface. File `gigaam_transformers.py` contains model, feature extractor and tokenizer classes with usual transformers methods. Jupyter `GigaAMHFTrain.ipynb` contains training pipeline with `transformers`. ## Usage Usage is same as for other `transformers` asr models. ```python >>> from gigaam_transformers import GigaAMCTCHF, GigaAMProcessor >>> import torchaudio >>> # load audio >>> wav, sr = torchaudio.load("audio.wav") >>> # resample if necessary >>> wav = torchaudio.functional.resample(wav, sr, 16000) >>> # load model and processor >>> processor = GigaAMProcessor.from_pretrained("waveletdeboshir/gigaam-ctc") >>> model = GigaAMCTCHF.from_pretrained("waveletdeboshir/gigaam-ctc") >>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt") >>> # predict >>> pred = model(input_features) >>> # greedy decoding >>> greedy_ids = pred.predictions.argmax(dim=-1) >>> # decode token ids to text >>> transcription = processor.batch_decode(greedy_ids) ``` ## Finetune