Spaces:
Running
on
CPU Upgrade
Adding Phi-4-Multimodal
Here are the results of Phi-4-Multimodal on the Open ASR leaderboard benchmarks:
Filtering models by id: microsoft/Phi-4-multimodal-instruct
Results per dataset:
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_ami_test: WER = 11.45 %, RTFx = 33.35
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_earnings22_test: WER = 10.50 %, RTFx = 33.66
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_gigaspeech_test: WER = 9.77 %, RTFx = 41.77
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_librispeech_test.clea: WER = 1.67 %, RTFx = 47.28
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_librispeech_test.other: WER = 3.82 %, RTFx = 45.86
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_spgispeech_test: WER = 3.11 %, RTFx = 49.44
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_tedlium_test: WER = 2.89 %, RTFx = 43.44
microsoft/Phi-4-multimodal-instruct | hf-audio-esb-datasets-test-only-sorted_voxpopuli_test: WER = 5.93 %, RTFx = 47.18
Composite Results:
microsoft/Phi-4-multimodal-instruct: WER = 6.14 %
microsoft/Phi-4-multimodal-instruct: RTFx = 45.52
Here is the results reported in the technical report in comparaison: