|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
### Model Description |
|
|
|
OpenAIμ whisper-base λͺ¨λΈμ μλ λ°μ΄ν°μ
μΌλ‘ νμ΅ν λͺ¨λΈμ
λλ€. phonetic formμ μ¬μ©νμ¬ νμ΅λμμ΅λλ€. |
|
- νκ΅μ΄ μμ± (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123) |
|
- μ£Όμ μμ± λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71556) |
|
- μ£Όμ μμλ³ νμ μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=464) |
|
- μ μμ§ μ νλ§ μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=571) |
|
- λ°©μ‘ μ½ν
μΈ λν체 μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463) |
|
|
|
``` |
|
train_steps: 20000 |
|
warmup_steps: 2000 |
|
lr scheduler: linear warmup cosine decay |
|
max learning rate: 1e-4 |
|
batch size: 256 |
|
max_grad_norm: 1.0 |
|
adamw_beta1: 0.9 |
|
adamw_beta2: 0.98 |
|
``` |
|
|
|
### Evaluation |
|
|
|
https://github.com/rtzr/Awesome-Korean-Speech-Recognition |
|
|
|
μ λ ν¬μ§ν 리μμ μ£Όμ μμλ³ νμ μμ±μ μ μΈν ν
μ€νΈμ
κ²°κ³Όμ
λλ€. μλ ν
μ΄λΈμμ whisper_base_komixv2_phnκ° λ³Έ λͺ¨λΈ μ±λ₯μ
λλ€. |
|
|
|
|
|
| Model | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other | |
|
|-----------------------|----------|-----------|---------------|------------|---------------|--------------|-------------|-------------| |
|
| whisper_base | 21.16 | 11.89 | 42.56 | 27.62 | 22.24 | 28.65 | 30.41 | 27.02 | |
|
| whisper_base_komix | 15.42 | 7.16 | 20.86 | 14.24 | 12.64 | 13.44 | 12.26 | 12.12 | |
|
| whisper_base_komixv2 | 13.04 | 7.04 | 10.54 | 13.1 | 10.65 | 12.99 | 12.44 | 12.56 | |
|
| whisper_base_komixv2_phn | 12.81 | 8.27 | 9.5 | 13.26 | 11.33 | 14.24 | 13.11 | 13.3 | |
|
| whisper_large_v3 | 5.11 | 3.72 | 5.45 | 9.35 | 3.83 | 8.46 | 15.08 | 12.89 | |
|
| whisper_large_v3_turbo | 5.38 | 3.95 | 5.89 | 9.77 | 4.21 | 9.27 | 16.49 | 13.54 | |
|
|
|
### Acknowledgement |
|
- λ³Έ λͺ¨λΈμ ꡬκΈμ TRC νλ‘κ·Έλ¨μ μ§μμΌλ‘ νμ΅νμ΅λλ€. |
|
- Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC) |