Audio - a diwank Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

diwank 's Collections

Med

code

F

search

Vision

Art

K

S1.1

Sam

Audio

thought

Audio

updated about 2 hours ago

espnet/yodas2

Updated 10 days ago • 12.8k • 34
Flux9665/BibleMMS

Viewer • Updated Jun 16, 2024 • 736k • 562 • 67
google/MusicCaps

Viewer • Updated Mar 8, 2023 • 5.52k • 618 • 134
ShoukanLabs/AniSpeech

Viewer • Updated Jan 29, 2024 • 23.7k • 401 • 50
aoxo/text2asmr-uncensored

Preview • Updated Feb 19, 2024 • 69 • 7
google/fleurs

Updated Aug 25, 2024 • 32.5k • 294
phongdtd/youtube_casual_audio

Updated Sep 10, 2024 • 43 • 4
ProgramComputer/voxceleb

Updated Jul 27, 2024 • 1.82k • 82
jhu-clsp/seamless-align

Preview • Updated Jun 2, 2024 • 1.19k • 12
IVLLab/MultiDialog

Updated Aug 29, 2024 • 446 • 20
PetraAI/PetraAI

Updated Sep 14, 2023 • 112 • 21
ReDUB/SoundHarvest

Viewer • Updated Dec 14, 2023 • 2 • 18 • 2
jhu-clsp/seamless-align-expressive

Updated Feb 22, 2024 • 130 • 5
jg583/NSynth

Updated Apr 26, 2024 • 217 • 18
voice-is-cool/voxtube

Viewer • Updated Feb 13, 2024 • 4.46M • 529 • 16
google/speech_commands

Updated Jan 18, 2024 • 1.89k • 44
Fhrozen/FSD50k

Preview • Updated 9 days ago • 1.63k • 7
nvidia/parakeet-tdt-1.1b

Automatic Speech Recognition • Updated Feb 18 • 6k • 100
yl4579/StyleTTS2-LibriTTS

Updated Nov 21, 2023 • 54
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 2.01M • 2.7k
facebook/wav2vec2-large-robust

Updated Nov 5, 2021 • 5.68k • 35
laion/links_to_pocasts_lecture_and_shows_for_tts

Viewer • Updated May 29, 2024 • 331k • 11 • 9
laion/youtube-urls-for-emotional-tts

Viewer • Updated May 21, 2024 • 78.3k • 13 • 3
laion/chirp-v2-dataset

Viewer • Updated Mar 25, 2024 • 64 • 27 • 6
speechcolab/gigaspeech

Viewer • Updated Nov 23, 2023 • 364k • 16k • 109
fixie-ai/boolq-audio

Viewer • Updated Jun 12, 2024 • 12.7k • 190 • 7
fixie-ai/soda-audio

Viewer • Updated Jul 24, 2024 • 102k • 117 • 4
amphion/Emilia

Preview • Updated Sep 6, 2024 • 23 • 83
google/cvss

Updated Feb 10, 2024 • 101 • 13
PolyAI/minds14

Updated Sep 10, 2024 • 3.77k • 82
Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • Updated Jan 12 • 104k • 448
infgrad/dialogue_rewrite_llm

Viewer • Updated Feb 17, 2024 • 1.64M • 36 • 14
FBK-MT/Speech-MASSIVE

Viewer • Updated Aug 8, 2024 • 97.6k • 1.39k • 40
Qwen/Qwen2-Audio-7B

Audio-Text-to-Text • Updated Nov 20, 2024 • 51.9k • 121
Mozilla/whisperfile

Updated Oct 2, 2024 • 533 • 243
vucinatim/spectrogram-captions

Viewer • Updated Jan 3, 2023 • 1k • 96 • 4
rachit8562/mel_spectogram_bird_audio

Viewer • Updated Jan 7, 2023 • 72.2k • 14 • 2
novateur/WavTokenizer

Text-to-Speech • Updated Dec 2, 2024 • 52
gpt-omni/mini-omni

Text-to-Speech • Updated Sep 4, 2024 • 3 • 427
amphion/Emilia-Dataset

Viewer • Updated Feb 28 • 54.8M • 81.1k • 314
FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1, 2024 • 34
feizhengcong/FluxMusic

Updated Nov 22, 2024 • 65
fishaudio/fish-speech-1.4

Text-to-Speech • Updated Nov 5, 2024 • 508 • 451
ICTNLP/Llama-3.1-8B-Omni

Updated Nov 14, 2024 • 349 • 404
HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 3.86k • 308
kyutai/moshiko-pytorch-bf16

Updated Sep 18, 2024 • 166k • 176
kyutai/moshika-pytorch-bf16

Updated Sep 18, 2024 • 346 • 55
Revai/reverb-asr

Automatic Speech Recognition • Updated Dec 9, 2024 • 12 • 84
FBK-MT/mosel

Viewer • Updated Feb 20 • 57.5M • 599 • 72
Menlo/llama3-s-instruct-v0.2

Updated Aug 23, 2024 • 2 • 45
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21 • 826k • 1.03k
mit-han-lab/hart-0.7b-1024px

Unconditional Image Generation • Updated Nov 17, 2024 • 13
THUDM/glm-4-voice-9b

Updated Oct 25, 2024 • 2.98k • 103
amphion/MaskGCT

Text-to-Speech • Updated Apr 13 • 48 • 287
nvidia/parakeet-tdt_ctc-110m

Automatic Speech Recognition • Updated Feb 18 • 34.9k • 31
nvidia/audio-flamingo

Updated Oct 2, 2024 • 25
fishaudio/fish-agent-v0.1-3b

Audio-to-Audio • Updated Nov 1, 2024 • 357 • 260
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • Updated Apr 17 • 248 • 301
adamo1139/Meta_Spirit-LM-ungated

Text-to-Audio • Updated Oct 20, 2024 • 18
si-pbc/hertz-dev

Audio-to-Audio • Updated Nov 14, 2024 • 212
pyannote/speech-separation-ami-1.0

Updated Nov 11, 2024 • 3.01k • 56
nyuuzyou/suno

Preview • Updated Nov 20, 2024 • 401 • 59
gpt-omni/mini-omni2

Any-to-Any • Updated Oct 24, 2024 • 123 • 271
fixie-ai/ultravox-v0_4_1-llama-3_1-70b

Audio-Text-to-Text • Updated 19 days ago • 86 • 24
aiola/whisper-ner-tag-and-mask-v1

Automatic Speech Recognition • Updated Nov 21, 2024 • 13 • 5
nyrahealth/CrisperWhisper

Automatic Speech Recognition • Updated Dec 19, 2024 • 28.6k • 289
laion/laions_got_talent

Viewer • Updated Jan 5 • 461k • 11.4k • 29
nvidia/se_den_sb_16k_small

Updated Nov 28, 2024 • 2
nvidia/se_der_sb_16k_small

Updated Nov 28, 2024 • 2
nvidia/sr_ssl_flowmatching_16k_430m

Updated Nov 28, 2024 • 7
nvidia/low-frame-rate-speech-codec-22khz

Updated Dec 12, 2024 • 1.47k • 13
laion/laion-audio-preview

Viewer • Updated Dec 4, 2024 • 4.15M • 348 • 11
NexaAIDev/OmniAudio-2.6B

Audio-Text-to-Text • Updated Dec 13, 2024 • 582 • 266
laion/LAION-Audio-300M

Viewer • Updated Jan 10 • 229M • 9.45k • 31
hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10 • 1.98M • • 4.4k
ByteDance/Make-An-Audio-2

Updated May 22, 2024 • 13
tincans-ai/pause-asr-alpha

Automatic Speech Recognition • Updated Sep 17, 2024 • 1 • 6
nvidia/bigvgan_v2_44khz_128band_512x

Audio-to-Audio • Updated Sep 5, 2024 • 272k • 43
speechbrain/sepformer-wham

Audio-to-Audio • Updated Feb 19, 2024 • 1.21k • 44
blaise-tk/TITAN

Audio-to-Audio • Updated Aug 19, 2024 • 13 • 63
ResembleAI/resemble-enhance

Audio-to-Audio • Updated Dec 21, 2023 • 145
declare-lab/TangoFlux

Text-to-Audio • Updated 18 days ago • 835 • 95
declare-lab/tango-full

Text-to-Audio • Updated Jun 10, 2024 • 10 • 12
declare-lab/mustango

Text-to-Audio • Updated Dec 17, 2023 • 162 • 40
declare-lab/tango2

Text-to-Audio • Updated Apr 16, 2024 • 111 • 17
declare-lab/tango2-full

Text-to-Audio • Updated Dec 29, 2024 • 20 • 9
HKUSTAudio/Llasa-3B

Text-to-Speech • Updated 15 days ago • 2.76k • 500
fixie-ai/ultravox-v0_4_1-llama-3_3-70b

Audio-Text-to-Text • Updated 19 days ago • 62 • 11
UsefulSensors/moonshine-base

Automatic Speech Recognition • Updated Jan 30 • 6.5k • 33
UsefulSensors/moonshine

Automatic Speech Recognition • Updated Feb 5 • 61
laion/laions_got_talent_raw

Viewer • Updated Jan 13 • 59k • 241 • 2
HKUSTAudio/Llasa-8B

Text-to-Speech • Updated Mar 9 • 4.02k • 94
baichuan-inc/Baichuan-Omni-1d5

Updated Feb 8 • 175 • 43
m-a-p/YuE-s1-7B-anneal-en-icl

Text Generation • Updated Mar 12 • 3.17k • 46
m-a-p/YuE-s1-7B-anneal-en-cot

Text Generation • Updated Mar 12 • 14.5k • 407
unlimitedbytes/hailuo-ai-voices

Viewer • Updated Jan 19 • 68k • 857 • 6
m-a-p/YuE-s2-1B-general

Text Generation • Updated Mar 12 • 12k • 51
Zyphra/Zonos-v0.1-speaker-embedding

Updated Feb 12 • 27
Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • Updated about 5 hours ago • 13.7k • 1.07k
FunAudioLLM/InspireMusic-1.5B-24kHz

Updated Mar 28 • 1 • 6
jadechoghari/VoiceRestore

Audio-to-Audio • Updated Oct 2, 2024 • 31 • 41
stepfun-ai/Step-Audio-Tokenizer

Updated Feb 18 • 37
stepfun-ai/Step-Audio-TTS-3B

Text-to-Speech • Updated Feb 17 • 236 • 184
stepfun-ai/Step-Audio-Chat

Audio-Text-to-Text • Updated Feb 17 • 185 • 440
Felguk/Felguk-omni-v0

Audio-Text-to-Text • Updated Jan 19 • 7 • 2
livekit/turn-detector

Text Generation • Updated Dec 12, 2024 • 28k • 55
facebook/jasco-chords-drums-melody-1B

Updated Mar 13 • 10
HKUSTAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7 • 8 • 6
ASLP-lab/DiffRhythm-base

Updated Mar 26 • 85 • 163
SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7 • 6.68k • 648
nvidia/audio-flamingo-2-0.5B

Audio-Text-to-Text • Updated Apr 19 • 7
sesame/csm-1b

Text-to-Speech • Updated Mar 16 • 44.2k • 2.05k
kyutai/mimi

Feature Extraction • Updated Sep 18, 2024 • 901k • • 209
Roblox/voice-safety-classifier

Audio Classification • Updated Jul 8, 2024 • 3.65k • 38
canopylabs/orpheus-3b-0.1-pretrained

Text-to-Speech • Updated Mar 19 • 19.1k • • 141
ibm-granite/granite-speech-3.2-8b

Automatic Speech Recognition • Updated Apr 16 • 1.18k • 80
ByteDance/MegaTTS3

Text-to-Speech • Updated Apr 4 • 1.13k • 371
amphion/Vevo

Text-to-Speech • Updated Apr 13 • 56 • 35
amphion/Vevo1.5

Updated Apr 13 • 83 • 11
kyutai/DailyTalkContiguous

Preview • Updated Mar 24 • 227 • 8
nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 3 days ago • 564k • 1.06k
ibm-granite/granite-speech-3.3-8b

Automatic Speech Recognition • Updated 9 days ago • 3.01k • 43
ICTNLP/SLED-TTS-Streaming-Libriheavy

Text-to-Speech • Updated 4 days ago • 8 • 2
ACE-Step/ACE-Step-v1-3.5B

Text-to-Audio • Updated 4 days ago • 466
VITA-MLLM/VITA-Audio-Plus-Vanilla

Updated 19 days ago • 1.15k • 4
ICTNLP/InstructS2S-200K

Viewer • Updated 7 days ago • 200k • 344 • 1
ICTNLP/LLaMA-Omni2-14B

Updated 7 days ago • 25 • 1
laion/empathic-insights-voice

Updated 7 days ago • 88 • 1
disco-eth/EuroSpeech

Viewer • Updated 7 days ago • 8.42M • 27.9k • 57

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs