Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. phonetic form을 μ‚¬μš©ν•˜μ—¬ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

train_steps: 20000
warmup_steps: 2000
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 256
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98

Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2_phnκ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.

Model cv_15_ko fleurs_ko kcall_testset kconf_test kcounsel_test klec_testset kspon_clean kspon_other
whisper_base 21.16 11.89 42.56 27.62 22.24 28.65 30.41 27.02
whisper_base_komix 15.42 7.16 20.86 14.24 12.64 13.44 12.26 12.12
whisper_base_komixv2 13.04 7.04 10.54 13.1 10.65 12.99 12.44 12.56
whisper_base_komixv2_phn 12.81 8.27 9.5 13.26 11.33 14.24 13.11 13.3
whisper_large_v3 5.11 3.72 5.45 9.35 3.83 8.46 15.08 12.89
whisper_large_v3_turbo 5.38 3.95 5.89 9.77 4.21 9.27 16.49 13.54

Acknowledgement

  • λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
  • Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.