kerlos127 commited on
Commit
afee768
·
verified ·
1 Parent(s): 101c34e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - th
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - whisper-event
8
+ - generated_from_trainer
9
+ datasets:
10
+ - mozilla-foundation/common_voice_13_0
11
+ - google/fleurs
12
+ metrics:
13
+ - wer
14
+ base_model: openai/whisper-medium
15
+ model-index:
16
+ - name: Whisper Medium Thai Combined V4 - biodatlab
17
+ results:
18
+ - task:
19
+ type: automatic-speech-recognition
20
+ name: Automatic Speech Recognition
21
+ dataset:
22
+ name: mozilla-foundation/common_voice_13_0 th
23
+ type: mozilla-foundation/common_voice_13_0
24
+ config: th
25
+ split: test
26
+ args: th
27
+ metrics:
28
+ - type: wer
29
+ value: 7.42
30
+ name: Wer
31
+ ---
32
+
33
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
34
+ should probably proofread and complete it, then remove this comment. -->
35
+
36
+ # Whisper Medium (Thai): Combined V3
37
+
38
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on augmented versions of the mozilla-foundation/common_voice_13_0 th, google/fleurs, and curated datasets.
39
+ It achieves the following results on the common-voice-13 test set:
40
+ - WER: 7.42 (with Deepcut Tokenizer)
41
+
42
+ ## Model description
43
+
44
+ Use the model with huggingface's `transformers` as follows:
45
+
46
+ ```py
47
+ from transformers import pipeline
48
+
49
+ MODEL_NAME = "biodatlab/whisper-th-medium-combined" # specify the model name
50
+ lang = "th" # change to Thai langauge
51
+
52
+ device = 0 if torch.cuda.is_available() else "cpu"
53
+
54
+ pipe = pipeline(
55
+ task="automatic-speech-recognition",
56
+ model=MODEL_NAME,
57
+ chunk_length_s=30,
58
+ device=device,
59
+ )
60
+ pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
61
+ language=lang,
62
+ task="transcribe"
63
+ )
64
+ text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text
65
+ ```
66
+
67
+
68
+ ## Intended uses & limitations
69
+
70
+ More information needed
71
+
72
+ ## Training and evaluation data
73
+
74
+ More information needed
75
+
76
+ ## Training procedure
77
+
78
+ ### Training hyperparameters
79
+
80
+ The following hyperparameters were used during training:
81
+ - learning_rate: 1e-05
82
+ - train_batch_size: 16
83
+ - eval_batch_size: 16
84
+ - seed: 42
85
+ - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
86
+ - lr_scheduler_type: linear
87
+ - lr_scheduler_warmup_steps: 500
88
+ - training_steps: 10000
89
+ - mixed_precision_training: Native AMP
90
+
91
+ ### Framework versions
92
+
93
+ - Transformers 4.37.2
94
+ - Pytorch 2.1.0
95
+ - Datasets 2.16.1
96
+ - Tokenizers 0.15.1
97
+
98
+ ## Citation
99
+
100
+ Cite using Bibtex:
101
+
102
+ ```
103
+ @misc {thonburian_whisper_med,
104
+ author = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
105
+ title = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
106
+ year = 2022,
107
+ url = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
108
+ doi = { 10.57967/hf/0226 },
109
+ publisher = { Hugging Face }
110
+ }
111
+ ```