Help Needed: Stuck on not expected Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model

#1
by nabin2004 - opened

I’m working with the NeMo model: indicconformer_stt_ne_hybrid_ctc_rnnt_large and I’m trying to extract multiple hypotheses (e.g., top 5 beam outputs) from the speech-to-text results. Specifically, I want to use beam decoding to get 5 alternate transcriptions for an audio file.

Here’s what I’ve tried so far:

  • Set beam_size = 5 and return_best_hypothesis = False in the decoding config
  • Used asr_model.change_decoding_strategy(...)
  • Tried modifying cur_decoder, cur_strategy, cur_beam_size, cur_nbest at runtime
  • Even unzipped the .nemo model, edited the config manually, rezipped it — still didn’t work.

Here’s a snippet of approaches I used:

decoding_cfg = OmegaConf.create({
    'strategy': 'beam',
    'beam': {
        'beam_size': 5,
        'return_best_hypothesis': False,
        'score_norm': True
    }
})

asr_model.change_decoding_strategy(decoding_cfg)

audio_path = "../sample_data/nepali_1.wav"
print(f"\n[INFO] Transcribing audio file: {audio_path}")

asr_model.cur_decoder = "rnnt"
asr_model.cur_strategy = "beam"
asr_model.cur_beam_size = 5
asr_model.cur_nbest = 5

rnnt_text = asr_model.transcribe([audio_path], batch_size=1, language_id='ne')
print("\n[ASR Raw Output]:", rnnt_text)
print("\n[ASR count]:", len(rnnt_text))

Unfortunately, transcribe() always returns just two hypothesis, no matter what I do.
If anyone has working experience with hybrid RNNT + CTC NeMo models and knows how to get multiple beam hypotheses (especially for RNNT or CTC side), I would deeply appreciate any guidance, example code, or even pointers to relevant resources/docs.

nabin2004 changed discussion title from Help Needed: Stuck Getting Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model to Help Needed: Stuck on not expected Multiple Hypotheses from NeMo indicconformer_stt_ne_hybrid_ctc_rnnt_large Model

Sign up or log in to comment