Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
base_model:
|
6 |
+
- openai/whisper-medium
|
7 |
+
---
|
8 |
+
|
9 |
+
### Model Description
|
10 |
+
|
11 |
+
OpenAIμ whisper-medium λͺ¨λΈμ μλ λ°μ΄ν°μ
μΌλ‘ νμ΅ν λͺ¨λΈμ
λλ€. μ¬μ©μ€μΈ ν
μ€νΈμ
κΈ°μ€μΌλ‘ νκ· μ±λ₯μ΄ whisper-large-v3λ³΄λ€ μ’μ΅λλ€.
|
12 |
+
|
13 |
+
- νκ΅μ΄ μμ± (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
|
14 |
+
- μ£Όμ μμ± λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71556)
|
15 |
+
- μ£Όμ μμλ³ νμ μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=464)
|
16 |
+
- μ μμ§ μ νλ§ μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=571)
|
17 |
+
- λ°©μ‘ μ½ν
μΈ λν체 μμ±μΈμ λ°μ΄ν° (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463)
|
18 |
+
|
19 |
+
Training setup
|
20 |
+
|
21 |
+
```
|
22 |
+
train_steps: 50000
|
23 |
+
warmup_steps: 500
|
24 |
+
lr scheduler: linear warmup cosine decay
|
25 |
+
max learning rate: 1e-4
|
26 |
+
batch size: 1024
|
27 |
+
max_grad_norm: 1.0
|
28 |
+
adamw_beta1: 0.9
|
29 |
+
adamw_beta2: 0.98
|
30 |
+
adamw_eps: 1e-6
|
31 |
+
```
|
32 |
+
|
33 |
+
### Evaluation
|
34 |
+
|
35 |
+
https://github.com/rtzr/Awesome-Korean-Speech-Recognition
|
36 |
+
|
37 |
+
μ λ ν¬μ§ν 리μμ μ£Όμ μμλ³ νμ μμ±μ μ μΈν ν
μ€νΈμ
κ²°κ³Όμ
λλ€. μλ ν
μ΄λΈμμ whisper_medium_komixv2κ° λ³Έ λͺ¨λΈ μ±λ₯μ
λλ€.
|
38 |
+
|
39 |
+
|
40 |
+
| Model | Average | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other |
|
41 |
+
|------------------------|---------|----------|-----------|---------------|------------|---------------|--------------|-------------|-------------|
|
42 |
+
| whisper_tiny | 36.63 | 31.03 | 18.48 | 58.57 | 36.02 | 33.52 | 35.74 | 42.22 | 37.42 |
|
43 |
+
| whisper_base | 40.61 | 22.45 | 15.7 | 85.94 | 41.95 | 32.38 | 39.24 | 46.92 | 40.29 |
|
44 |
+
| whisper_small | 17.52 | 11.56 | 6.33 | 30.79 | 18.96 | 13.57 | 18.71 | 22.02 | 18.23 |
|
45 |
+
| whisper_medium | 13.92 | 8.2 | 4.38 | 25.73 | 15.66 | 10.1 | 14.9 | 17.16 | 15.22 |
|
46 |
+
| whisper_large | 12.77 | 6.83 | 3.9 | 22.68 | 14.35 | 9.2 | 13.89 | 16.78 | 14.56 |
|
47 |
+
| whisper_large_v2 | 12.29 | 6.58 | 3.74 | 22.26 | 13.88 | 8.95 | 13.84 | 15.51 | 13.6 |
|
48 |
+
| whisper_large_v3 | 7.99 | 5.11 | 3.72 | 5.45 | 9.35 | 3.83 | 8.46 | 15.08 | 12.89 |
|
49 |
+
| whisper_large_v3_turbo | 10.75 | 5.38 | 3.99 | 10.93 | 10.27 | 4.21 | 9.42 | 26.66 | 15.16 |
|
50 |
+
| whisper_base_komixv2 | 8.73 | 10.27 | 5.14 | 6.23 | 10.86 | 7.01 | 10.38 | 9.98 | 9.99 |
|
51 |
+
| whisper_small_komixv2 | 7.36 | 7.07 | 4.19 | 5.6 | 9.67 | 5.5 | 8.55 | 9.26 | 9.07 |
|
52 |
+
| whisper_medium_komixv2 | 7.3 | 6.62 | 4.52 | 5.85 | 9.42 | 5.47 | 8.38 | 9.19 | 8.97 |
|
53 |
+
|
54 |
+
### Acknowledgement
|
55 |
+
- λ³Έ λͺ¨λΈμ ꡬκΈμ TRC νλ‘κ·Έλ¨μ μ§μμΌλ‘ νμ΅νμ΅λλ€.
|
56 |
+
- Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)
|