SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 1 day ago • 34
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Paper • 1712.05884 • Published Dec 16, 2017 • 3
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation Paper • 2411.18447 • Published Nov 27, 2024 • 2
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 3 days ago • 14
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 3 days ago • 19
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models Paper • 2402.13071 • Published Feb 20, 2024 • 1
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 33
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Paper • 2412.10117 • Published Dec 13, 2024 • 3
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Paper • 2407.05407 • Published Jul 7, 2024 • 1
Moshi: a speech-text foundation model for real-time dialogue Paper • 2410.00037 • Published Sep 17, 2024 • 2
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5, 2024 • 36
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Paper • 2408.17175 • Published Aug 30, 2024 • 2
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec Paper • 2305.02765 • Published May 4, 2023 • 1
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models Paper • 2303.12984 • Published Mar 23, 2023 • 1
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music Paper • 2412.17667 • Published Dec 23, 2024 • 1