SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering
Abstract
SonicMaster, a unified generative model, improves music audio quality by addressing various artifacts using text-based control and a flow-matching generative training paradigm.
Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized equipment or expertise. These problems are typically corrected using separate specialized tools and manual adjustments. In this paper, we introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control. SonicMaster is conditioned on natural language instructions to apply targeted enhancements, or can operate in an automatic mode for general restoration. To train this model, we construct the SonicMaster dataset, a large dataset of paired degraded and high-quality tracks by simulating common degradation types with nineteen degradation functions belonging to five enhancements groups: equalization, dynamics, reverb, amplitude, and stereo. Our approach leverages a flow-matching generative training paradigm to learn an audio transformation that maps degraded inputs to their cleaned, mastered versions guided by text prompts. Objective audio quality metrics demonstrate that SonicMaster significantly improves sound quality across all artifact categories. Furthermore, subjective listening tests confirm that listeners prefer SonicMaster's enhanced outputs over the original degraded audio, highlighting the effectiveness of our unified approach.
Community
First all-in-one music restoration and mastering model. Listen to some examples here: https://amaai-lab.github.io/SonicMaster/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment (2025)
- Token-based Audio Inpainting via Discrete Diffusion (2025)
- DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization (2025)
- MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners (2025)
- BNMusic: Blending Environmental Noises into Personalized Music (2025)
- SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture (2025)
- FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper