DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion Paper • 2105.13871 • Published May 28, 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio Paper • 2106.06909 • Published Jun 13, 2021 • 1
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Paper • 2204.09934 • Published Apr 21, 2022
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity Paper • 2302.04023 • Published Feb 8, 2023
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts Paper • 2105.03036 • Published May 7, 2021 • 2
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Paper • 2309.12792 • Published Sep 22, 2023 • 1
MM-LLMs: Recent Advances in MultiModal Large Language Models Paper • 2401.13601 • Published Jan 24, 2024 • 49