Open Whisper-style Speech Models (OWSM)

pyf98 's Collections

updated 5 days ago

Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/

Upvote

Sleeping

55

55

OWSM Demo

🔊
espnet/yodas_owsmv4

Updated 4 days ago • 101 • 3

Note The filtered YODAS subset used to train the OWSM v4 series.
espnet/owsm_ctc_v4_1B

Automatic Speech Recognition • Updated about 2 hours ago • 58 • 2

Note OWSM-CTC v4 is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_medium_1B

Automatic Speech Recognition • Updated 4 days ago • 47 • 1

Note OWSM v4 (1B) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_small_370M

Automatic Speech Recognition • Updated about 2 hours ago • 23 • 1

Note OWSM v4 (370M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_v4_base_102M

Automatic Speech Recognition • Updated 4 days ago • 25 • 1

Note OWSM v4 (102M) is trained on a newly curated dataset from YODAS along with previous OWSM data, which significantly enhances multilingual performance.
espnet/owsm_ctc_v3.2_ft_1B

Automatic Speech Recognition • Updated 19 days ago • 88 • 4

Note OWSM-CTC v3.1 is further fine-tuned on v3.2 data to improve long-form robustness.
espnet/owsm_ctc_v3.1_1B

Automatic Speech Recognition • Updated 19 days ago • 46 • 13

Note (ACL'24) CTC-based non-autoregressive speech foundation model for multilingual ASR, ST, and LID.
espnet/owsm_v3.1_ebf

Automatic Speech Recognition • Updated 19 days ago • 240 • 17

Note (INTERSPEECH'24) OWSM v3.1 medium with 1.02B parameters.
espnet/owsm_v3.1_ebf_small

Automatic Speech Recognition • Updated 19 days ago • 22 • 2

Note (INTERSPEECH'24) OWSM v3.1 small with 367M parameters.
espnet/owsm_v3.1_ebf_base

Automatic Speech Recognition • Updated 19 days ago • 19 • 3

Note (INTERSPEECH'24) OWSM v3.1 base with 101M parameters.
espnet/owsm_v3.1_ebf_small_lowrestriction

Automatic Speech Recognition • Updated Mar 27 • 5 • 2

Note (INTERSPEECH'24) OWSM v3.1 small trained on a subset of data with low restriction licenses.
espnet/owsm_v3.2

Automatic Speech Recognition • Updated Aug 26, 2024 • 7 • 5

Note (INTERSPEECH'24) OWSM small with data cleaning.
espnet/owsm_v3

Automatic Speech Recognition • Updated Feb 6 • 10 • 27
espnet/owsm_v2_ebranchformer

Automatic Speech Recognition • Updated Oct 30, 2023 • 6
espnet/owsm_v2

Automatic Speech Recognition • Updated Jul 29, 2023 • 3 • 4
espnet/owsm_v1

Automatic Speech Recognition • Updated Oct 19, 2023 • 3
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Paper • 2506.00338 • Published 8 days ago • 8
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Paper • 2402.12654 • Published Feb 20, 2024 • 1
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Paper • 2401.16658 • Published Jan 30, 2024 • 14
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Paper • 2309.13876 • Published Sep 25, 2023 • 1

Upvote

Open Whisper-style Speech Models (OWSM)

OWSM Demo