-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
mozilla-foundation/common_voice_17_0
Viewer • Updated • 13M • 25.4k • 171 -
Stable Audio Open
Paper • 2407.14358 • Published • 23 -
fnlp/AnyGPT-chat
Text Generation • Updated • 1.1k • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2311.07919
-
ModaVerse: Efficiently Transforming Modalities with LLMs
Paper • 2401.06395 • Published • 3 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 10 -
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Paper • 2312.03668 • Published • 1 -
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Paper • 2311.06753 • Published • 6
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 29
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 2 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 3 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 7
-
Qwen Technical Report
Paper • 2309.16609 • Published • 34 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 6 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
193📷🎨👀
Qwen-VL-Plus
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 29 -
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Simple synthetic data reduces sycophancy in large language models
Paper • 2308.03958 • Published • 21 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 6