navidved/open_persian_asr_leaderboard · Evaluation Dataset Size and Details

Hi,

Based on the benchmark chart and table we published, we evaluate open-source models using several standard, open-source datasets, and Common Voice is one of them. When official splits exist, we use those (including the official test set) for reporting.

Regarding vhdm/whisper-large-fa-v1: it has been widely promoted on social media, which made us curious to test it. In our evaluation, the model appears to have fundamental training issues; its high WER is largely due to empty outputs and hallucinations/out-of-domain text (sometimes in English). We normally wouldn’t include such models on the leaderboard, but given the extensive promotion, we added it to provide transparency about its actual accuracy.

Thanks for suggesting PartAI/PSRB; we’ll review and add it. If we can obtain the full 10-hour benchmark, we’ll use that; otherwise, we’ll start with the 1-hour subset.