Instructions to use HKUSTAudio/AudioX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Stable Audio Tools
How to use HKUSTAudio/AudioX with Stable Audio Tools:
import torch import torchaudio from einops import rearrange from stable_audio_tools import get_pretrained_model from stable_audio_tools.inference.generation import generate_diffusion_cond device = "cuda" if torch.cuda.is_available() else "cpu" # Download model model, model_config = get_pretrained_model("HKUSTAudio/AudioX") sample_rate = model_config["sample_rate"] sample_size = model_config["sample_size"] model = model.to(device) # Set up text and timing conditioning conditioning = [{ "prompt": "128 BPM tech house drum loop", }] # Generate stereo audio output = generate_diffusion_cond( model, conditioning=conditioning, sample_size=sample_size, device=device ) # Rearrange audio batch to a single sequence output = rearrange(output, "b d n -> d (b n)") # Peak normalize, clip, convert to int16, and save to file output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu() torchaudio.save("output.wav", output, sample_rate) - Notebooks
- Google Colab
- Kaggle
Thoughts on AudioX Licensing for Maximum Community Impact
Dear AudioX Team,
Congratulations on the impressive research behind AudioX! The unified "Anything-to-Audio" framework using a Diffusion Transformer is a really interesting and powerful approach to multi-modal generation.
I was looking into the project and saw the CC BY-NC 4.0 license. While great for academic use, the NonCommercial restriction does limit its potential integration into the wider ecosystem of tools and applications (both open-source and commercial) where generative AI is flourishing.
We're currently seeing many state-of-the-art generative models, including capable multi-modal audio frameworks like MMAudio (which uses MIT), being released under permissive licenses (MIT, Apache 2.0). This openness really seems to fuel community engagement, innovation, and adoption, allowing research to have the broadest possible impact.
Just wanted to share a community perspective and ask if you might consider aligning AudioX with this trend by adopting a similar permissive open-source license down the line? It feels like a model with AudioX's unique capabilities could become even more influential and widely used if the licensing allowed for that broader integration.
Thanks for sharing your groundbreaking work and for considering this feedback!