seastar105's picture
Update README.md
af82ecc verified
|
raw
history blame
3.27 kB
metadata
library_name: transformers
language:
  - ko
base_model:
  - openai/whisper-base

Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.

Training setup

train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6

Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2κ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.

Model Average cv_15_ko fleurs_ko kcall_testset kconf_test kcounsel_test klec_testset kspon_clean kspon_other
whisper_tiny 36.63 31.03 18.48 58.57 36.02 33.52 35.74 42.22 37.42
whisper_base 40.61 22.45 15.7 85.94 41.95 32.38 39.24 46.92 40.29
whisper_small 17.52 11.56 6.33 30.79 18.96 13.57 18.71 22.02 18.23
whisper_medium 13.92 8.2 4.38 25.73 15.66 10.1 14.9 17.16 15.22
whisper_large 12.77 6.83 3.9 22.68 14.35 9.2 13.89 16.78 14.56
whisper_large_v2 12.29 6.58 3.74 22.26 13.88 8.95 13.84 15.51 13.6
whisper_large_v3 7.99 5.11 3.72 5.45 9.35 3.83 8.46 15.08 12.89
whisper_large_v3_turbo 10.75 5.38 3.99 10.93 10.27 4.21 9.42 26.66 15.16
whisper_base_komixv2 8.73 10.27 5.14 6.23 10.86 7.01 10.38 9.98 9.99
whisper_small_komixv2 7.63 7.2 4.63 5.47 9.79 6.16 8.68 9.65 9.44

Acknowledgement

  • λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
  • Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)