seastar105's picture
Update README.md
333037e verified
---
library_name: transformers
tags: []
---
### Model Description
OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. phonetic form을 μ‚¬μš©ν•˜μ—¬ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
- ν•œκ΅­μ–΄ μŒμ„± (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
- μ£Όμ†Œ μŒμ„± 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71556)
- μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=464)
- μ €μŒμ§ˆ 전화망 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=571)
- 방솑 μ½˜ν…μΈ  λŒ€ν™”μ²΄ μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463)
```
train_steps: 20000
warmup_steps: 2000
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 256
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
```
### Evaluation
https://github.com/rtzr/Awesome-Korean-Speech-Recognition
μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2_phnκ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.
| Model | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other |
|-----------------------|----------|-----------|---------------|------------|---------------|--------------|-------------|-------------|
| whisper_base | 21.16 | 11.89 | 42.56 | 27.62 | 22.24 | 28.65 | 30.41 | 27.02 |
| whisper_base_komix | 15.42 | 7.16 | 20.86 | 14.24 | 12.64 | 13.44 | 12.26 | 12.12 |
| whisper_base_komixv2 | 13.04 | 7.04 | 10.54 | 13.1 | 10.65 | 12.99 | 12.44 | 12.56 |
| whisper_base_komixv2_phn | 12.81 | 8.27 | 9.5 | 13.26 | 11.33 | 14.24 | 13.11 | 13.3 |
| whisper_large_v3 | 5.11 | 3.72 | 5.45 | 9.35 | 3.83 | 8.46 | 15.08 | 12.89 |
| whisper_large_v3_turbo | 5.38 | 3.95 | 5.89 | 9.77 | 4.21 | 9.27 | 16.49 | 13.54 |
### Acknowledgement
- λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
- Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)