japanese-asr
/

ja-cascaded-s2t-translation

@@ -40,19 +40,19 @@ See [https://github.com/kotoba-tech/kotoba-whisper](https://github.com/kotoba-te
 Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
 Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
-| model                                                                                                                                                                                                     |    10 |    30 |    60 |
-|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|
-| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     | 0.173 | 0.247 | 0.352 |
-| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B))                     | 0.173 | 0.24  | 0.348 |
-| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17  | 0.245 | 0.348 |
-| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 |
-| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                                                                                 | 0.061 | 0.184 | 0.372 |
-| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                                                                                 | 0.062 | 0.199 | 0.415 |
-| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                                                                                       | 0.062 | 0.183 | 0.363 |
-| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                                                                                     | 0.045 | 0.132 | 0.266 |
-| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                                                                                       | 0.135 | 0.376 | 0.631 |
-| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                                                                         | 0.054 | 0.108 | 0.231 |
-| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                                                                         | 0.045 | 0.124 | 0.208 |
 ## Usage
 Here is an example to translate Japanese speech into English text translation.

 Due to the nature of cascaded approach, the pipeline has additional complexity compared to the single end2end OpenAI whisper models for the sake of high accuracy.
 Following table shows the mean inference time in second averaged over 10 trials on audio sample with different durations.
+| model                                                                                                                                                                                                     |    10 |    30 |    60 |   300 |
+|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|------:|------:|------:|
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B))                     | 0.173 | 0.247 | 0.352 | 1.772 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B))                     | 0.173 | 0.24  | 0.348 | 1.515 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)) | 0.17  | 0.245 | 0.348 | 1.882 |
+| [japanese-asr/ja-cascaded-s2t-translation](https://huggingface.co/japanese-asr/ja-cascaded-s2t-translation) ([facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)) | 0.108 | 0.179 | 0.283 | 1.33  |
+| [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)                                                                                                                                 | 0.061 | 0.184 | 0.372 | 1.804 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                                                                                                                                 | 0.062 | 0.199 | 0.415 | 1.854 |
+| [openai/whisper-large](https://huggingface.co/openai/whisper-large)                                                                                                                                       | 0.062 | 0.183 | 0.363 | 1.899 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                                                                                                                     | 0.045 | 0.132 | 0.266 | 1.368 |
+| [openai/whisper-small](https://huggingface.co/openai/whisper-small)                                                                                                                                       | 0.135 | 0.376 | 0.631 | 3.495 |
+| [openai/whisper-base](https://huggingface.co/openai/whisper-base)                                                                                                                                         | 0.054 | 0.108 | 0.231 | 1.019 |
+| [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)                                                                                                                                         | 0.045 | 0.124 | 0.208 | 0.838 |
 ## Usage
 Here is an example to translate Japanese speech into English text translation.