TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper • 2508.16790 • Published 7 days ago • 3
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models Paper • 2304.00830 • Published Apr 3, 2023 • 2
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5, 2024 • 39
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis Paper • 2404.03204 • Published Apr 4, 2024 • 10
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words Paper • 2406.13340 • Published Jun 19, 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation Paper • 2407.05361 • Published Jul 7, 2024 • 2
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Paper • 2409.00750 • Published Sep 1, 2024 • 4
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Paper • 2502.03128 • Published Feb 5 • 1
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement Paper • 2501.15417 • Published Jan 26 • 1
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation Paper • 2505.13000 • Published May 19 • 1
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations Paper • 2508.04195 • Published 23 days ago • 1
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper • 2508.16790 • Published 7 days ago • 3
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling Paper • 2508.16790 • Published 7 days ago • 3 • 2