Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
diwank
's Collections
Med
code
Robotics
reasoning
F
search
Vision
Art
K
S1.1
Sam
Audio
thought
Audio
updated
about 2 hours ago
Upvote
-
espnet/yodas2
Updated
10 days ago
•
12.8k
•
34
Flux9665/BibleMMS
Viewer
•
Updated
Jun 16, 2024
•
736k
•
562
•
67
google/MusicCaps
Viewer
•
Updated
Mar 8, 2023
•
5.52k
•
618
•
134
ShoukanLabs/AniSpeech
Viewer
•
Updated
Jan 29, 2024
•
23.7k
•
401
•
50
aoxo/text2asmr-uncensored
Preview
•
Updated
Feb 19, 2024
•
69
•
7
google/fleurs
Updated
Aug 25, 2024
•
32.5k
•
294
phongdtd/youtube_casual_audio
Updated
Sep 10, 2024
•
43
•
4
ProgramComputer/voxceleb
Updated
Jul 27, 2024
•
1.82k
•
82
jhu-clsp/seamless-align
Preview
•
Updated
Jun 2, 2024
•
1.19k
•
12
IVLLab/MultiDialog
Updated
Aug 29, 2024
•
446
•
20
PetraAI/PetraAI
Updated
Sep 14, 2023
•
112
•
21
ReDUB/SoundHarvest
Viewer
•
Updated
Dec 14, 2023
•
2
•
18
•
2
jhu-clsp/seamless-align-expressive
Updated
Feb 22, 2024
•
130
•
5
jg583/NSynth
Updated
Apr 26, 2024
•
217
•
18
voice-is-cool/voxtube
Viewer
•
Updated
Feb 13, 2024
•
4.46M
•
529
•
16
google/speech_commands
Updated
Jan 18, 2024
•
1.89k
•
44
Fhrozen/FSD50k
Preview
•
Updated
9 days ago
•
1.63k
•
7
nvidia/parakeet-tdt-1.1b
Automatic Speech Recognition
•
Updated
Feb 18
•
6k
•
100
yl4579/StyleTTS2-LibriTTS
Updated
Nov 21, 2023
•
54
coqui/XTTS-v2
Text-to-Speech
•
Updated
Dec 11, 2023
•
2.01M
•
2.7k
facebook/wav2vec2-large-robust
Updated
Nov 5, 2021
•
5.68k
•
35
laion/links_to_pocasts_lecture_and_shows_for_tts
Viewer
•
Updated
May 29, 2024
•
331k
•
11
•
9
laion/youtube-urls-for-emotional-tts
Viewer
•
Updated
May 21, 2024
•
78.3k
•
13
•
3
laion/chirp-v2-dataset
Viewer
•
Updated
Mar 25, 2024
•
64
•
27
•
6
speechcolab/gigaspeech
Viewer
•
Updated
Nov 23, 2023
•
364k
•
16k
•
109
fixie-ai/boolq-audio
Viewer
•
Updated
Jun 12, 2024
•
12.7k
•
190
•
7
fixie-ai/soda-audio
Viewer
•
Updated
Jul 24, 2024
•
102k
•
117
•
4
amphion/Emilia
Preview
•
Updated
Sep 6, 2024
•
23
•
83
google/cvss
Updated
Feb 10, 2024
•
101
•
13
PolyAI/minds14
Updated
Sep 10, 2024
•
3.77k
•
82
Qwen/Qwen2-Audio-7B-Instruct
Audio-Text-to-Text
•
Updated
Jan 12
•
104k
•
448
infgrad/dialogue_rewrite_llm
Viewer
•
Updated
Feb 17, 2024
•
1.64M
•
36
•
14
FBK-MT/Speech-MASSIVE
Viewer
•
Updated
Aug 8, 2024
•
97.6k
•
1.39k
•
40
Qwen/Qwen2-Audio-7B
Audio-Text-to-Text
•
Updated
Nov 20, 2024
•
51.9k
•
121
Mozilla/whisperfile
Updated
Oct 2, 2024
•
533
•
243
vucinatim/spectrogram-captions
Viewer
•
Updated
Jan 3, 2023
•
1k
•
96
•
4
rachit8562/mel_spectogram_bird_audio
Viewer
•
Updated
Jan 7, 2023
•
72.2k
•
14
•
2
novateur/WavTokenizer
Text-to-Speech
•
Updated
Dec 2, 2024
•
52
gpt-omni/mini-omni
Text-to-Speech
•
Updated
Sep 4, 2024
•
3
•
427
amphion/Emilia-Dataset
Viewer
•
Updated
Feb 28
•
54.8M
•
81.1k
•
314
FLUX that Plays Music
Paper
•
2409.00587
•
Published
Sep 1, 2024
•
34
feizhengcong/FluxMusic
Updated
Nov 22, 2024
•
65
fishaudio/fish-speech-1.4
Text-to-Speech
•
Updated
Nov 5, 2024
•
508
•
451
ICTNLP/Llama-3.1-8B-Omni
Updated
Nov 14, 2024
•
349
•
404
HuggingFaceFV/finevideo
Viewer
•
Updated
Dec 16, 2024
•
39.5k
•
3.86k
•
308
kyutai/moshiko-pytorch-bf16
Updated
Sep 18, 2024
•
166k
•
176
kyutai/moshika-pytorch-bf16
Updated
Sep 18, 2024
•
346
•
55
Revai/reverb-asr
Automatic Speech Recognition
•
Updated
Dec 9, 2024
•
12
•
84
FBK-MT/mosel
Viewer
•
Updated
Feb 20
•
57.5M
•
599
•
72
Menlo/llama3-s-instruct-v0.2
Updated
Aug 23, 2024
•
2
•
45
SWivid/F5-TTS
Text-to-Speech
•
Updated
Mar 21
•
826k
•
1.03k
mit-han-lab/hart-0.7b-1024px
Unconditional Image Generation
•
Updated
Nov 17, 2024
•
13
THUDM/glm-4-voice-9b
Updated
Oct 25, 2024
•
2.98k
•
103
amphion/MaskGCT
Text-to-Speech
•
Updated
Apr 13
•
48
•
287
nvidia/parakeet-tdt_ctc-110m
Automatic Speech Recognition
•
Updated
Feb 18
•
34.9k
•
31
nvidia/audio-flamingo
Updated
Oct 2, 2024
•
25
fishaudio/fish-agent-v0.1-3b
Audio-to-Audio
•
Updated
Nov 1, 2024
•
357
•
260
OuteAI/OuteTTS-0.1-350M
Text-to-Speech
•
Updated
Apr 17
•
248
•
301
adamo1139/Meta_Spirit-LM-ungated
Text-to-Audio
•
Updated
Oct 20, 2024
•
18
si-pbc/hertz-dev
Audio-to-Audio
•
Updated
Nov 14, 2024
•
212
pyannote/speech-separation-ami-1.0
Updated
Nov 11, 2024
•
3.01k
•
56
nyuuzyou/suno
Preview
•
Updated
Nov 20, 2024
•
401
•
59
gpt-omni/mini-omni2
Any-to-Any
•
Updated
Oct 24, 2024
•
123
•
271
fixie-ai/ultravox-v0_4_1-llama-3_1-70b
Audio-Text-to-Text
•
Updated
19 days ago
•
86
•
24
aiola/whisper-ner-tag-and-mask-v1
Automatic Speech Recognition
•
Updated
Nov 21, 2024
•
13
•
5
nyrahealth/CrisperWhisper
Automatic Speech Recognition
•
Updated
Dec 19, 2024
•
28.6k
•
289
laion/laions_got_talent
Viewer
•
Updated
Jan 5
•
461k
•
11.4k
•
29
nvidia/se_den_sb_16k_small
Updated
Nov 28, 2024
•
2
nvidia/se_der_sb_16k_small
Updated
Nov 28, 2024
•
2
nvidia/sr_ssl_flowmatching_16k_430m
Updated
Nov 28, 2024
•
7
nvidia/low-frame-rate-speech-codec-22khz
Updated
Dec 12, 2024
•
1.47k
•
13
laion/laion-audio-preview
Viewer
•
Updated
Dec 4, 2024
•
4.15M
•
348
•
11
NexaAIDev/OmniAudio-2.6B
Audio-Text-to-Text
•
Updated
Dec 13, 2024
•
582
•
266
laion/LAION-Audio-300M
Viewer
•
Updated
Jan 10
•
229M
•
9.45k
•
31
hexgrad/Kokoro-82M
Text-to-Speech
•
Updated
Apr 10
•
1.98M
•
•
4.4k
ByteDance/Make-An-Audio-2
Updated
May 22, 2024
•
13
tincans-ai/pause-asr-alpha
Automatic Speech Recognition
•
Updated
Sep 17, 2024
•
1
•
6
nvidia/bigvgan_v2_44khz_128band_512x
Audio-to-Audio
•
Updated
Sep 5, 2024
•
272k
•
43
speechbrain/sepformer-wham
Audio-to-Audio
•
Updated
Feb 19, 2024
•
1.21k
•
44
blaise-tk/TITAN
Audio-to-Audio
•
Updated
Aug 19, 2024
•
13
•
63
ResembleAI/resemble-enhance
Audio-to-Audio
•
Updated
Dec 21, 2023
•
145
declare-lab/TangoFlux
Text-to-Audio
•
Updated
18 days ago
•
835
•
95
declare-lab/tango-full
Text-to-Audio
•
Updated
Jun 10, 2024
•
10
•
12
declare-lab/mustango
Text-to-Audio
•
Updated
Dec 17, 2023
•
162
•
40
declare-lab/tango2
Text-to-Audio
•
Updated
Apr 16, 2024
•
111
•
17
declare-lab/tango2-full
Text-to-Audio
•
Updated
Dec 29, 2024
•
20
•
9
HKUSTAudio/Llasa-3B
Text-to-Speech
•
Updated
15 days ago
•
2.76k
•
500
fixie-ai/ultravox-v0_4_1-llama-3_3-70b
Audio-Text-to-Text
•
Updated
19 days ago
•
62
•
11
UsefulSensors/moonshine-base
Automatic Speech Recognition
•
Updated
Jan 30
•
6.5k
•
33
UsefulSensors/moonshine
Automatic Speech Recognition
•
Updated
Feb 5
•
61
laion/laions_got_talent_raw
Viewer
•
Updated
Jan 13
•
59k
•
241
•
2
HKUSTAudio/Llasa-8B
Text-to-Speech
•
Updated
Mar 9
•
4.02k
•
94
baichuan-inc/Baichuan-Omni-1d5
Updated
Feb 8
•
175
•
43
m-a-p/YuE-s1-7B-anneal-en-icl
Text Generation
•
Updated
Mar 12
•
3.17k
•
46
m-a-p/YuE-s1-7B-anneal-en-cot
Text Generation
•
Updated
Mar 12
•
14.5k
•
407
unlimitedbytes/hailuo-ai-voices
Viewer
•
Updated
Jan 19
•
68k
•
857
•
6
m-a-p/YuE-s2-1B-general
Text Generation
•
Updated
Mar 12
•
12k
•
51
Zyphra/Zonos-v0.1-speaker-embedding
Updated
Feb 12
•
27
Zyphra/Zonos-v0.1-hybrid
Text-to-Speech
•
Updated
about 5 hours ago
•
13.7k
•
1.07k
FunAudioLLM/InspireMusic-1.5B-24kHz
Updated
Mar 28
•
1
•
6
jadechoghari/VoiceRestore
Audio-to-Audio
•
Updated
Oct 2, 2024
•
31
•
41
stepfun-ai/Step-Audio-Tokenizer
Updated
Feb 18
•
37
stepfun-ai/Step-Audio-TTS-3B
Text-to-Speech
•
Updated
Feb 17
•
236
•
184
stepfun-ai/Step-Audio-Chat
Audio-Text-to-Text
•
Updated
Feb 17
•
185
•
440
Felguk/Felguk-omni-v0
Audio-Text-to-Text
•
Updated
Jan 19
•
7
•
2
livekit/turn-detector
Text Generation
•
Updated
Dec 12, 2024
•
28k
•
55
facebook/jasco-chords-drums-melody-1B
Updated
Mar 13
•
10
HKUSTAudio/Spark-TTS-0.5B
Text-to-Speech
•
Updated
Mar 7
•
8
•
6
ASLP-lab/DiffRhythm-base
Updated
Mar 26
•
85
•
163
SparkAudio/Spark-TTS-0.5B
Text-to-Speech
•
Updated
Mar 7
•
6.68k
•
648
nvidia/audio-flamingo-2-0.5B
Audio-Text-to-Text
•
Updated
Apr 19
•
7
sesame/csm-1b
Text-to-Speech
•
Updated
Mar 16
•
44.2k
•
2.05k
kyutai/mimi
Feature Extraction
•
Updated
Sep 18, 2024
•
901k
•
•
209
Roblox/voice-safety-classifier
Audio Classification
•
Updated
Jul 8, 2024
•
3.65k
•
38
canopylabs/orpheus-3b-0.1-pretrained
Text-to-Speech
•
Updated
Mar 19
•
19.1k
•
•
141
ibm-granite/granite-speech-3.2-8b
Automatic Speech Recognition
•
Updated
Apr 16
•
1.18k
•
80
ByteDance/MegaTTS3
Text-to-Speech
•
Updated
Apr 4
•
1.13k
•
371
amphion/Vevo
Text-to-Speech
•
Updated
Apr 13
•
56
•
35
amphion/Vevo1.5
Updated
Apr 13
•
83
•
11
kyutai/DailyTalkContiguous
Preview
•
Updated
Mar 24
•
227
•
8
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition
•
Updated
3 days ago
•
564k
•
1.06k
ibm-granite/granite-speech-3.3-8b
Automatic Speech Recognition
•
Updated
9 days ago
•
3.01k
•
43
ICTNLP/SLED-TTS-Streaming-Libriheavy
Text-to-Speech
•
Updated
4 days ago
•
8
•
2
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio
•
Updated
4 days ago
•
466
VITA-MLLM/VITA-Audio-Plus-Vanilla
Updated
19 days ago
•
1.15k
•
4
ICTNLP/InstructS2S-200K
Viewer
•
Updated
7 days ago
•
200k
•
344
•
1
ICTNLP/LLaMA-Omni2-14B
Updated
7 days ago
•
25
•
1
laion/empathic-insights-voice
Updated
7 days ago
•
88
•
1
disco-eth/EuroSpeech
Viewer
•
Updated
7 days ago
•
8.42M
•
27.9k
•
57
Upvote
-
Share collection
View history
Collection guide
Browse collections