File size: 2,595 Bytes
1f130a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
333037e
 
 
1f130a4
333037e
 
1f130a4
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
library_name: transformers
tags: []
---

### Model Description

OpenAI의 whisper-base λͺ¨λΈμ„ μ•„λž˜ λ°μ΄ν„°μ…‹μœΌλ‘œ ν•™μŠ΅ν•œ λͺ¨λΈμž…λ‹ˆλ‹€. phonetic form을 μ‚¬μš©ν•˜μ—¬ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
- ν•œκ΅­μ–΄ μŒμ„± (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123)
- μ£Όμ†Œ μŒμ„± 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71556)
- μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=464)
- μ €μŒμ§ˆ 전화망 μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=571)
- 방솑 μ½˜ν…μΈ  λŒ€ν™”μ²΄ μŒμ„±μΈμ‹ 데이터 (https://www.aihub.or.kr/aihubdata/data/view.do?dataSetSn=463)

```
train_steps: 20000
warmup_steps: 2000
lr scheduler: linear warmup cosine decay
max learning rate: 1e-4
batch size: 256
max_grad_norm: 1.0
adamw_beta1: 0.9
adamw_beta2: 0.98
```

### Evaluation

https://github.com/rtzr/Awesome-Korean-Speech-Recognition

μœ„ λ ˆν¬μ§€ν† λ¦¬μ—μ„œ μ£Όμš” μ˜μ—­λ³„ 회의 μŒμ„±μ„ μ œμ™Έν•œ ν…ŒμŠ€νŠΈμ…‹ κ²°κ³Όμž…λ‹ˆλ‹€. μ•„λž˜ ν…Œμ΄λΈ”μ—μ„œ whisper_base_komixv2_phnκ°€ λ³Έ λͺ¨λΈ μ„±λŠ₯μž…λ‹ˆλ‹€.


|         Model         | cv_15_ko | fleurs_ko | kcall_testset | kconf_test | kcounsel_test | klec_testset | kspon_clean | kspon_other |
|-----------------------|----------|-----------|---------------|------------|---------------|--------------|-------------|-------------|
|       whisper_base       |  21.16   |   11.89   |     42.56     |   27.62    |     22.24     |    28.65     |    30.41    |    27.02    |
|    whisper_base_komix    |  15.42   |    7.16   |     20.86     |   14.24    |     12.64     |    13.44     |    12.26    |    12.12    |
|   whisper_base_komixv2   |  13.04   |    7.04   |     10.54     |    13.1    |     10.65     |    12.99     |    12.44    |    12.56    |
| whisper_base_komixv2_phn |  12.81   |    8.27   |      9.5      |   13.26    |     11.33     |    14.24     |    13.11    |     13.3    |
|     whisper_large_v3     |   5.11   |    3.72   |      5.45     |    9.35    |      3.83     |     8.46     |    15.08    |    12.89    |
|  whisper_large_v3_turbo  |   5.38   |    3.95   |      5.89     |    9.77    |      4.21     |     9.27     |    16.49    |    13.54    |

### Acknowledgement
- λ³Έ λͺ¨λΈμ€ κ΅¬κΈ€μ˜ TRC ν”„λ‘œκ·Έλž¨μ˜ μ§€μ›μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
- Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)