File size: 3,273 Bytes
fcd7625
 
7ffc976
 
 
 
fcd7625
 
 
 
0dc93ce
fcd7625
 
 
 
 
 
06474dd
 
fcd7625
06474dd
 
fcd7625
 
06474dd
fcd7625
 
 
06474dd
fcd7625
 
 
 
 
 
 
 
 
af82ecc
 
 
15d185f
af82ecc
15d185f
af82ecc
15d185f
af82ecc
15d185f
af82ecc
 
06474dd
fcd7625
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
library_name: transformers
language:
- ko
base_model:
- openai/whisper-base
---

### Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€.
- ν•œκ΅­μ–΄ μŒμ„± (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
- μ£Όμ†Œ μŒμ„± 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71556)
- μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=464)
- μ €μŒμ§ˆ 전화망 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=571)
- 방솑 μ½˜ν…μΈ  λŒ€ν™”μ²΄ μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463)

Training setup

```
train_steps: 50000
warmup_steps: 500
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 1024
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
adamw_eps: 1e-6
```

### Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2κ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.


|         Model          | Average | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other |
|------------------------|---------|----------|-----------|---------------|------------|---------------|--------------|-------------|-------------|
|      whisper_tiny      |  36.63  |  31.03   |   18.48   |     58.57     |   36.02    |     33.52     |    35.74     |    42.22    |    37.42    |
|  whisper_tiny_komixv2  |   11.6  |  14.56   |    6.54   |      9.12     |   13.19    |     11.62     |    13.16     |    12.13    |    12.52    |
|      whisper_base      |  40.61  |  22.45   |    15.7   |     85.94     |   41.95    |     32.38     |    39.24     |    46.92    |    40.29    |
|  whisper_base_komixv2  |   8.73  |  10.27   |    5.14   |      6.23     |   10.86    |      7.01     |    10.38     |     9.98    |     9.99    |
|     whisper_small      |  17.52  |  11.56   |    6.33   |     30.79     |   18.96    |     13.57     |    18.71     |    22.02    |    18.23    |
| whisper_small_komixv2  |   7.36  |   7.07   |    4.19   |      5.6      |    9.67    |      5.5      |     8.55     |     9.26    |     9.07    |
|     whisper_medium     |  13.92  |   8.2    |    4.38   |     25.73     |   15.66    |      10.1     |     14.9     |    17.16    |    15.22    |
| whisper_medium_komixv2 |   7.3   |   6.62   |    4.52   |      5.85     |    9.42    |      5.47     |     8.38     |     9.19    |     8.97    |
|    whisper_large_v3    |   7.99  |   5.11   |    3.72   |      5.45     |    9.35    |      3.83     |     8.46     |    15.08    |    12.89    |
| whisper_large_v3_turbo |  10.75  |   5.38   |    3.99   |     10.93     |   10.27    |      4.21     |     9.42     |    26.66    |    15.16    |


### Acknowledgement
- λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
- Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)