Help Needed: Stuck on not expected Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model
#1
by
nabin2004
- opened
I’m working with the NeMo model: indicconformer_stt_ne_hybrid_ctc_rnnt_large and I’m trying to extract multiple hypotheses (e.g., top 5 beam outputs) from the speech-to-text results. Specifically, I want to use beam decoding to get 5 alternate transcriptions for an audio file.
Here’s what I’ve tried so far:
- Set beam_size = 5 and return_best_hypothesis = False in the decoding config
- Used asr_model.change_decoding_strategy(...)
- Tried modifying cur_decoder, cur_strategy, cur_beam_size, cur_nbest at runtime
- Even unzipped the .nemo model, edited the config manually, rezipped it — still didn’t work.
Here’s a snippet of approaches I used:
decoding_cfg = OmegaConf.create({
'strategy': 'beam',
'beam': {
'beam_size': 5,
'return_best_hypothesis': False,
'score_norm': True
}
})
asr_model.change_decoding_strategy(decoding_cfg)
audio_path = "../sample_data/nepali_1.wav"
print(f"\n[INFO] Transcribing audio file: {audio_path}")
asr_model.cur_decoder = "rnnt"
asr_model.cur_strategy = "beam"
asr_model.cur_beam_size = 5
asr_model.cur_nbest = 5
rnnt_text = asr_model.transcribe([audio_path], batch_size=1, language_id='ne')
print("\n[ASR Raw Output]:", rnnt_text)
print("\n[ASR count]:", len(rnnt_text))
Unfortunately, transcribe() always returns just two hypothesis, no matter what I do.
If anyone has working experience with hybrid RNNT + CTC NeMo models and knows how to get multiple beam hypotheses (especially for RNNT or CTC side), I would deeply appreciate any guidance, example code, or even pointers to relevant resources/docs.
nabin2004
changed discussion title from
Help Needed: Stuck Getting Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model
to Help Needed: Stuck on not expected Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model