mjwong commited on
Commit
ce2f4a1
·
verified ·
1 Parent(s): 6251725

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -94,6 +94,8 @@ We evaluated the speculative decoding setup for Whisper-large-v3-singlish on the
94
 
95
  - [AMI](https://huggingface.co/datasets/edinburghcstr/ami): A widely used dataset for meeting transcription and diarization tasks. This work specifically uses the IHM (Individual Headset Microphone) recordings.
96
 
 
 
97
  ### Model Performance
98
 
99
  | **Dataset** | **Model Variant** | **Link** | **Rel. RTFx** | **WER** |
@@ -105,17 +107,22 @@ We evaluated the speculative decoding setup for Whisper-large-v3-singlish on the
105
  | AMI | Large | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish) | 1.00 | 23.72% |
106
  | AMI | Large-Turbo | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish) | 1.53 | **16.99%** |
107
  | AMI | Draft-enhanced Large | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.27** | 22.06% |
 
 
 
 
108
 
109
  ### Speculative Acceptance Rates (DRAFT-enhanced Large Model)
110
 
111
- | **Dataset** | **Micro Avg Acceptance** | **Macro Avg Acceptance** |
112
  |----------------|--------------------------|---------------------------|
113
  | SASRBench-v1 | 38.00% | 42.00% |
114
  | AMI | 38.00% | 43.00% |
 
115
 
116
  ### Conclusion
117
 
118
- While it does not outperform Large-Turbo in WER, the Draft-enhanced Large model demonstrates strong speculative acceptance rates (~38–43%), indicating meaningful potential for runtime gains through early prediction acceptance. In latency-sensitive applications, it offers a compelling middle ground between the high accuracy of Large-Turbo and the slower inference of standard decoding.
119
 
120
  ## Disclaimer
121
 
 
94
 
95
  - [AMI](https://huggingface.co/datasets/edinburghcstr/ami): A widely used dataset for meeting transcription and diarization tasks. This work specifically uses the IHM (Individual Headset Microphone) recordings.
96
 
97
+ - [GigaSpeech](https://huggingface.co/datasets/speechcolab/gigaspeech): A large-scale open-source dataset with diverse English audio, covering read, conversational, and spontaneous speech.
98
+
99
  ### Model Performance
100
 
101
  | **Dataset** | **Model Variant** | **Link** | **Rel. RTFx** | **WER** |
 
107
  | AMI | Large | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish) | 1.00 | 23.72% |
108
  | AMI | Large-Turbo | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish) | 1.53 | **16.99%** |
109
  | AMI | Draft-enhanced Large | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.27** | 22.06% |
110
+ ||||||
111
+ | GigaSpeech | Large | [Whisper-large-v3-singlish](https://huggingface.co/mjwong/whisper-large-v3-singlish) | 1.00 | 13.15% |
112
+ | GigaSpeech | Large-Turbo | [Whisper-large-v3-turbo-singlish](https://huggingface.co/mjwong/whisper-large-v3-turbo-singlish) | 1.95 | **11.54%** |
113
+ | GigaSpeech | Draft-enhanced Large | Whisper-large-v3-singlish + [DRAFT](https://huggingface.co/mjwong/whisper-large-v3-singlish-DRAFT) | **2.37** | 12.81% |
114
 
115
  ### Speculative Acceptance Rates (DRAFT-enhanced Large Model)
116
 
117
+ | **Dataset** | **Micro Avg Acceptance** | **Macro Avg Acceptance** |
118
  |----------------|--------------------------|---------------------------|
119
  | SASRBench-v1 | 38.00% | 42.00% |
120
  | AMI | 38.00% | 43.00% |
121
+ | GigaSpeech | 31.00% | 37.00% |
122
 
123
  ### Conclusion
124
 
125
+ While it does not outperform Large-Turbo in WER, the Draft-enhanced Large model demonstrates strong speculative acceptance rates (~31–43%), indicating meaningful potential for runtime gains through early prediction acceptance. In latency-sensitive applications, it offers a compelling middle ground between the high accuracy of Large-Turbo and the slower inference of standard decoding.
126
 
127
  ## Disclaimer
128