mjwong
/

whisper-large-v3-singlish-DRAFT

@@ -94,6 +94,8 @@ We evaluated the speculative decoding setup for Whisper-large-v3-singlish on the
 - [AMI](https://huggingface.co/datasets/edinburghcstr/ami): A widely used dataset for meeting transcription and diarization tasks. This work specifically uses the IHM (Individual Headset Microphone) recordings.
 ### Model Performance
 | **Dataset**     | **Model Variant**         | **Link**                                                                                           | **Rel. RTFx** | **WER**    |
@@ -105,17 +107,22 @@ We evaluated the speculative decoding setup for Whisper-large-v3-singlish on the
 | AMI             | Large                     | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish)               | 1.00          | 23.72%     |
 | AMI             | Large-Turbo               | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish)   | 1.53          | **16.99%** |
 | AMI             | Draft-enhanced Large      | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.27**      | 22.06%     |
 ### Speculative Acceptance Rates (DRAFT-enhanced Large Model)
-| **Dataset**    | **Micro Avg Acceptance** | **Macro Avg Acceptance** |
 |----------------|--------------------------|---------------------------|
 | SASRBench-v1   | 38.00%                   | 42.00%                    |
 | AMI            | 38.00%                   | 43.00%                    |
 ### Conclusion
-While it does not outperform Large-Turbo in WER, the Draft-enhanced Large model demonstrates strong speculative acceptance rates (~38–43%), indicating meaningful potential for runtime gains through early prediction acceptance. In latency-sensitive applications, it offers a compelling middle ground between the high accuracy of Large-Turbo and the slower inference of standard decoding.
 ## Disclaimer

 - [AMI](https://huggingface.co/datasets/edinburghcstr/ami): A widely used dataset for meeting transcription and diarization tasks. This work specifically uses the IHM (Individual Headset Microphone) recordings.
+- [GigaSpeech](https://huggingface.co/datasets/speechcolab/gigaspeech): A large-scale open-source dataset with diverse English audio, covering read, conversational, and spontaneous speech.
 ### Model Performance
 | **Dataset**     | **Model Variant**         | **Link**                                                                                           | **Rel. RTFx** | **WER**    |
 | AMI             | Large                     | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish)               | 1.00          | 23.72%     |
 | AMI             | Large-Turbo               | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish)   | 1.53          | **16.99%** |
 | AMI             | Draft-enhanced Large      | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.27**      | 22.06%     |
+||||||
+| GigaSpeech      | Large                     | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish)               | 1.00          | 13.15%     |
+| GigaSpeech      | Large-Turbo               | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish)   | 1.95          | **11.54%** |
+| GigaSpeech      | Draft-enhanced Large      | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.37**      | 12.81%     |
 ### Speculative Acceptance Rates (DRAFT-enhanced Large Model)
+| **Dataset**    | **Micro Avg Acceptance** | **Macro Avg Acceptance**  |
 |----------------|--------------------------|---------------------------|
 | SASRBench-v1   | 38.00%                   | 42.00%                    |
 | AMI            | 38.00%                   | 43.00%                    |
+| GigaSpeech     | 31.00%                   | 37.00%                    |
 ### Conclusion
+While it does not outperform Large-Turbo in WER, the Draft-enhanced Large model demonstrates strong speculative acceptance rates (~31–43%), indicating meaningful potential for runtime gains through early prediction acceptance. In latency-sensitive applications, it offers a compelling middle ground between the high accuracy of Large-Turbo and the slower inference of standard decoding.
 ## Disclaimer