Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.

Training setup

train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6

Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2κ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.

Model Average cv_15_ko fleurs_ko kcall_testset kconf_test kcounsel_test klec_testset kspon_clean kspon_other
whisper_tiny 36.63 31.03 18.48 58.57 36.02 33.52 35.74 42.22 37.42
whisper_base 40.61 22.45 15.7 85.94 41.95 32.38 39.24 46.92 40.29
whisper_small 17.52 11.56 6.33 30.79 18.96 13.57 18.71 22.02 18.23
whisper_medium 13.92 8.2 4.38 25.73 15.66 10.1 14.9 17.16 15.22
whisper_large 12.77 6.83 3.9 22.68 14.35 9.2 13.89 16.78 14.56
whisper_large_v2 12.29 6.58 3.74 22.26 13.88 8.95 13.84 15.51 13.6
whisper_large_v3 7.99 5.11 3.72 5.45 9.35 3.83 8.46 15.08 12.89
whisper_large_v3_turbo 10.75 5.38 3.99 10.93 10.27 4.21 9.42 26.66 15.16
whisper_base_komixv2 8.73 10.27 5.14 6.23 10.86 7.01 10.38 9.98 9.99
whisper_small_komixv2 7.63 7.2 4.63 5.47 9.79 6.16 8.68 9.65 9.44

Acknowledgement

  • λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
  • Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
Downloads last month
39
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for seastar105/whisper-base-komixv2

Finetuned
(429)
this model

Collection including seastar105/whisper-base-komixv2