jlehecka commited on
Commit
b636ea5
·
verified ·
1 Parent(s): 19d9ee7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "de"
3
+ tags:
4
+ - German
5
+ - KKY
6
+ - FAV
7
+ license: "cc-by-nc-sa-4.0"
8
+ ---
9
+
10
+ # wav2vec2-base-de-50k
11
+ This is a monolingual German Wav2Vec 2.0 base model pre-trained from 50 thousand hours of German speech.
12
+ It has been released along with a paper **A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for
13
+ Automatic Speech Recognition in Multilingual Oral History Archives** accepted to INTERSPEECH2024 conference.
14
+
15
+ ## Paper
16
+ The pre-print of our paper is available at http://arxiv.org/abs/2407.17160.
17
+
18
+ ### All pre-trained models released along with the paper
19
+ - [fav-kky/wav2vec2-base-cs-50k](https://huggingface.co/fav-kky/wav2vec2-base-cs-50k) (monolingual Czech)
20
+ - [fav-kky/wav2vec2-base-de-50k](https://huggingface.co/fav-kky/wav2vec2-base-de-50k) (monolingual German)
21
+ - [fav-kky/wav2vec2-base-cs-en-100k](https://huggingface.co/fav-kky/wav2vec2-base-cs-en-100k) (bilingual Czech+English)
22
+ - [fav-kky/wav2vec2-base-cs-de-100k](https://huggingface.co/fav-kky/wav2vec2-base-cs-de-100k) (bilingual Czech+German)
23
+ - [fav-kky/wav2vec2-base-en-de-100k](https://huggingface.co/fav-kky/wav2vec2-base-en-de-100k) (bilingual English+German)
24
+ - [fav-kky/wav2vec2-base-cs-en-de-150k](https://huggingface.co/fav-kky/wav2vec2-base-cs-en-de-150k) (trilingual Czech+English+German)
25
+
26
+ ## Citation
27
+ If you find this model useful, please cite our paper:
28
+ ```
29
+ @inproceedings{lehecka2024bitrilingual,
30
+ title = {{A Comparative Analysis of Bilingual and Trilingual Wav2Vec Models for Automatic Speech Recognition in Multilingual Oral History Archives}},
31
+ author = {
32
+ Jan Lehe\v{c}ka and
33
+ Josef V. Psutka and
34
+ Lubo\v{s} \v{S}m\'{i}dl and
35
+ Pavel Ircing and
36
+ Josef Psutka
37
+ },
38
+ booktitle={Proc. Interspeech 2024},
39
+ note={In Press},
40
+ year={2024},
41
+ url={https://arxiv.org/abs/2407.17160},
42
+ }
43
+ ```
44
+
45
+ ## Usage
46
+ This model does not have a tokenizer as it was pretrained on audio alone.
47
+ In order to use this model for speech recognition, a tokenizer should be created
48
+ and the model should be [fine-tuned](https://huggingface.co/blog/fine-tune-wav2vec2-english) on labeled ASR data.
49
+
50
+ Inputs must be 16kHz mono audio files.
51
+
52
+ This model can be used e.g., to extract per-frame contextual embeddings from audio:
53
+ ```python
54
+ from transformers import Wav2Vec2Model, Wav2Vec2FeatureExtractor
55
+ import torchaudio
56
+
57
+ feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("fav-kky/wav2vec2-base-de-50k")
58
+ model = Wav2Vec2Model.from_pretrained("fav-kky/wav2vec2-base-de-50k")
59
+
60
+ speech_array, sampling_rate = torchaudio.load("/path/to/audio/file.wav")
61
+ inputs = feature_extractor(
62
+ speech_array,
63
+ sampling_rate=16_000,
64
+ return_tensors="pt"
65
+ )["input_values"][0]
66
+
67
+ output = model(inputs)
68
+ embeddings = output.last_hidden_state.detach().numpy()[0]
69
+ ```
70
+
71
+ ## Related works
72
+