Open Whisper-style Speech Models (OWSM)
Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/
- 55🔊
espnet/yodas_owsmv4
Updated • 101 • 3Note The filtered YODAS subset used to train the OWSM v4 series.
espnet/owsm_ctc_v4_1B
Automatic Speech Recognition • Updated • 58 • 2Note OWSM-CTC v4 is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_medium_1B
Automatic Speech Recognition • Updated • 47 • 1Note OWSM v4 (1B) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_small_370M
Automatic Speech Recognition • Updated • 23 • 1Note OWSM v4 (370M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_base_102M
Automatic Speech Recognition • Updated • 25 • 1Note OWSM v4 (102M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_ctc_v3.2_ft_1B
Automatic Speech Recognition • Updated • 88 • 4Note OWSM-CTC v3.1 is further fine-tuned on v3.2 data to improve long-form robustness.
espnet/owsm_ctc_v3.1_1B
Automatic Speech Recognition • Updated • 46 • 13Note (ACL'24) CTC-based non-autoregressive speech foundation model for multilingual ASR, ST, and LID.
espnet/owsm_v3.1_ebf
Automatic Speech Recognition • Updated • 240 • 17Note (INTERSPEECH'24) OWSM v3.1 medium with 1.02B parameters.
espnet/owsm_v3.1_ebf_small
Automatic Speech Recognition • Updated • 22 • 2Note (INTERSPEECH'24) OWSM v3.1 small with 367M parameters.
espnet/owsm_v3.1_ebf_base
Automatic Speech Recognition • Updated • 19 • 3Note (INTERSPEECH'24) OWSM v3.1 base with 101M parameters.
espnet/owsm_v3.1_ebf_small_lowrestriction
Automatic Speech Recognition • Updated • 5 • 2Note (INTERSPEECH'24) OWSM v3.1 small trained on a subset of data with low restriction licenses.
espnet/owsm_v3.2
Automatic Speech Recognition • Updated • 7 • 5Note (INTERSPEECH'24) OWSM small with data cleaning.
espnet/owsm_v3
Automatic Speech Recognition • Updated • 10 • 27espnet/owsm_v2_ebranchformer
Automatic Speech Recognition • Updated • 6espnet/owsm_v2
Automatic Speech Recognition • Updated • 3 • 4espnet/owsm_v1
Automatic Speech Recognition • Updated • 3OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
Paper • 2506.00338 • Published • 8OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Paper • 2402.12654 • Published • 1OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Paper • 2401.16658 • Published • 14Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Paper • 2309.13876 • Published • 1