espnet
/

owls_4B_180K

Automatic Speech Recognition

speech-translation

Model card Files Files and versions Community

wanchichen commited on Feb 14

Commit

dc4d7cf

·

verified ·

1 Parent(s): db94c5a

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+tags:
+- espnet
+- audio
+- automatic-speech-recognition
+- speech-translation
+language: multilingual
+datasets:
+- owsm_v3.1
+license: cc-by-4.0
+---
+## OWLS: Open Whisper-style Large-scale neural model Suite
+OWLS is a suite of Whisper-style models, designed to help researchers understand the scaling properties of speech models.
+OWLS models are developed using [ESPnet](https://github.com/espnet/espnet), and support multilingual Speech Recognition and Translation.
+It is part of the [OWSM](https://www.wavlab.org/activities/2024/owsm/) project, which aims to develop fully open speech foundation models using publicly available data and open-source toolkits.
+The model in this repo has 4.66B parameters in total and is trained on 180k hours of public speech data.
+Specifically, it supports the following speech-to-text tasks:
+- Speech recognition
+- Any-to-any-language speech translation
+- Utterance-level alignment
+- Long-form transcription
+- Language identification
+## Use this model
+You can use this model in your projects with the following code:
+```python
+# make sure espnet is installed: pip install espnet
+from espnet2.bin.s2t_inference import Speech2Text
+model = Speech2Text.from_pretrained(
+  "espnet/owsm_v3.2"
+)
+speech, rate = soundfile.read("speech.wav")
+text, *_ = model(speech)[0]
+```
+## Citations
+TBA