GigaAM-v2-CTC with ngram LM and beamsearch π€ Hugging Face transformers
- original git https://github.com/salute-developers/GigaAM
- ngram LM from
bond005/wav2vec2-large-ru-golos-with-lm
Russian ASR model GigaAM-v2-CTC with external ngram LM and beamsearch decoding.
Model info
This is an original GigaAM-v2-CTC with transformers
library interface, beamsearch decoding and hypothesis rescoring with external ngram LM.
In addition it can be use to extract word-level timestamps.
File gigaam_transformers.py
contains model, feature extractor and tokenizer classes with usual transformers methods. Model can be initialized with transformers auto classes (see an example below).
Installation
my lib versions:
torch
2.5.1torchaudio
2.5.1transformers
4.49.0
You need to install kenlm
and pyctcdecode
:
pip install kenlm
pip install pyctcdecode
Usage
Usage is same as other transformers
ASR models.
from transformers import AutoModel, AutoProcessor
import torch
import torchaudio
# load audio
wav, sr = torchaudio.load("audio.wav")
# resample if necessary
wav = torchaudio.functional.resample(wav, sr, 16000)
# load model and processor
processor = AutoProcessor.from_pretrained("waveletdeboshir/gigaam-ctc-with-lm", trust_remote_code=True)
model = AutoModel.from_pretrained("waveletdeboshir/gigaam-ctc-with-lm", trust_remote_code=True)
model.eval()
input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")
# predict
with torch.no_grad():
logits = model(**input_features).logits
# decoding with beamseach and LM (tune alpha, beta, beam_width for your data)
transcription = processor.batch_decode(
logits=logits.numpy(),
beam_width=64,
alpha=0.5,
beta=0.5,
).text[0]
Decoding with timestamps
We can use decoder to extract word-level timestamps. For this we need to know model stride and set parameter output_word_offsets=True
.
In our case (Conformer) MODEL_STRIDE = 40
ms per timestamp.
MODEL_STRIDE = 40
outputs = processor.batch_decode(
logits=logits.numpy(),
beam_width=64,
alpha=0.5,
beta=0.5,
output_word_offsets=True
)
word_ts = [
{
"word": d["word"],
"start": round(d["start_offset"] * MODEL_STRIDE / 1000, 2),
"end": round(d["end_offset"] * MODEL_STRIDE / 1000, 2),
}
for d in outputs.word_offsets[0]
]
- Downloads last month
- 82