wanchichen commited on
Commit
dc4d7cf
·
verified ·
1 Parent(s): db94c5a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ - speech-translation
7
+ language: multilingual
8
+ datasets:
9
+ - owsm_v3.1
10
+ license: cc-by-4.0
11
+ ---
12
+
13
+ ## OWLS: Open Whisper-style Large-scale neural model Suite
14
+
15
+ OWLS is a suite of Whisper-style models, designed to help researchers understand the scaling properties of speech models.
16
+
17
+ OWLS models are developed using [ESPnet](https://github.com/espnet/espnet), and support multilingual Speech Recognition and Translation.
18
+
19
+ It is part of the [OWSM](https://www.wavlab.org/activities/2024/owsm/) project, which aims to develop fully open speech foundation models using publicly available data and open-source toolkits.
20
+
21
+ The model in this repo has 4.66B parameters in total and is trained on 180k hours of public speech data.
22
+ Specifically, it supports the following speech-to-text tasks:
23
+ - Speech recognition
24
+ - Any-to-any-language speech translation
25
+ - Utterance-level alignment
26
+ - Long-form transcription
27
+ - Language identification
28
+
29
+ ## Use this model
30
+
31
+ You can use this model in your projects with the following code:
32
+
33
+ ```python
34
+ # make sure espnet is installed: pip install espnet
35
+ from espnet2.bin.s2t_inference import Speech2Text
36
+
37
+ model = Speech2Text.from_pretrained(
38
+ "espnet/owsm_v3.2"
39
+ )
40
+
41
+ speech, rate = soundfile.read("speech.wav")
42
+ text, *_ = model(speech)[0]
43
+ ```
44
+
45
+
46
+ ## Citations
47
+
48
+ TBA
49
+
50
+