This model is fine-tuned from microsoft/Phi-4-multimodal-instruct on Bingsu/zeroth-korean, google/flerus in 5 epochs.
This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
After that, we continue training with CoVoST2 Dataset / CoVoST2-Ko for AST.
AST Finetuned model is Here : Phi-4-multimodal-instruct-ko-speech
Evaluation
Evaluation was done on the following datasets:
- ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) on zeroth-test set (457 samples).
- AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation result (270 samples).
Script is retrieved from here.
Compared to Phi-4-mm-inst-zeroth-kor and Phi-4-multimodal-finetune-ko-speech, ASR is significantly improved.
Model | zeroth-CER | zeroth-WER | fleurs-ko_en-BLEU | fleurs-ko_en-cot-BLEU | fleurs-en_ko-BLEU | fleurs-en_ko-cot-BLEU |
---|---|---|---|---|---|---|
original | 198.32 | - | 5.63 | 2.42 | 6.86 | 4.17 |
daekeun-ml/Phi-4-multimodal-finetune-ko-speech | 1.61 | 3.54 | 7.67 | 8.38 | 12.31 | 9.69 |
seastar105/Phi-4-mm-inst-zeroth-kor | 7.02 | - | 7.07 | 9.19 | 13.08 | 9.35 |
ASR finetune(this model) | 1.31 | 2.95 | 7.46 | 6.24 | 12.15 | 8.91 |
+ 1 epoch finetune with Covost-Ko | 3.88 | - | 8.07 | 10.09 | 18.82 | 15.41 |
AST finetuned model | 1.77 | 2.99 | 8.01 | 9.09 | 17.09 | 11.82 |
- Downloads last month
- 219
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for junnei/Phi-4-multimodal-instruct-ko-asr
Base model
microsoft/Phi-4-multimodal-instructDatasets used to train junnei/Phi-4-multimodal-instruct-ko-asr
Evaluation results
- zeroth-test-BLEU on zeroth-korean-testself-reported94.837
- zeroth-test-CER on zeroth-korean-testself-reported1.316
- zeroth-test-WER on zeroth-korean-testself-reported2.951
- fleurs-test-BLEU on flerus-ko-testself-reported67.659
- fleurs-test-CER on flerus-ko-testself-reported7.951
- fleurs-test-WER on flerus-ko-testself-reported18.313