Korean-Whisper
Collection
Collection of OpenAI whisper model fine-tuned on diverse korean dataset. Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
β’
3 items
β’
Updated
OpenAIμ whisper-base λͺ¨λΈμ μλ λ°μ΄ν°μ μΌλ‘ νμ΅ν λͺ¨λΈμ λλ€.
Training setup
train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6
https://github.com/rtzr/Awesome-Korean-Speech-Recognition
μ λ ν¬μ§ν 리μμ μ£Όμ μμλ³ νμ μμ±μ μ μΈν ν μ€νΈμ κ²°κ³Όμ λλ€. μλ ν μ΄λΈμμ whisper_base_komixv2κ° λ³Έ λͺ¨λΈ μ±λ₯μ λλ€.
Model | Average | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other |
---|---|---|---|---|---|---|---|---|---|
whisper_tiny | 36.63 | 31.03 | 18.48 | 58.57 | 36.02 | 33.52 | 35.74 | 42.22 | 37.42 |
whisper_base | 40.61 | 22.45 | 15.7 | 85.94 | 41.95 | 32.38 | 39.24 | 46.92 | 40.29 |
whisper_small | 17.52 | 11.56 | 6.33 | 30.79 | 18.96 | 13.57 | 18.71 | 22.02 | 18.23 |
whisper_medium | 13.92 | 8.2 | 4.38 | 25.73 | 15.66 | 10.1 | 14.9 | 17.16 | 15.22 |
whisper_large | 12.77 | 6.83 | 3.9 | 22.68 | 14.35 | 9.2 | 13.89 | 16.78 | 14.56 |
whisper_large_v2 | 12.29 | 6.58 | 3.74 | 22.26 | 13.88 | 8.95 | 13.84 | 15.51 | 13.6 |
whisper_large_v3 | 7.99 | 5.11 | 3.72 | 5.45 | 9.35 | 3.83 | 8.46 | 15.08 | 12.89 |
whisper_large_v3_turbo | 10.75 | 5.38 | 3.99 | 10.93 | 10.27 | 4.21 | 9.42 | 26.66 | 15.16 |
whisper_base_komixv2 | 8.73 | 10.27 | 5.14 | 6.23 | 10.86 | 7.01 | 10.38 | 9.98 | 9.99 |
whisper_small_komixv2 | 7.63 | 7.2 | 4.63 | 5.47 | 9.79 | 6.16 | 8.68 | 9.65 | 9.44 |
Base model
openai/whisper-base