MagentaRT Research API

Name	What it does	Example	When to set
`MRT_CKPT_REPO`	Hugging Face repo ID that hosts your finetune checkpoints/assets.	`thepatch/magenta-ft`	Set to make this finetune the default on boot.
`MRT_CKPT_STEP`	Checkpoint step number to load on boot.	`1863001`	Set if you want a specific checkpoint preselected.
`MRT_SIZE`	Base model family used by the finetune (e.g., large).	`large`	Set to match the base you finetuned from.
`SPACE_MODE`	Controls readiness behavior: `serve` (GPU, ready to generate) vs `template` (CPU template for duplication). If unset, the server auto-detects.	`serve` or `template`	Set for explicit behavior; otherwise it falls back to auto-detection.

Overview

This API powers AI music generation using Google's MagentaRT, designed for real-time audio streaming using finetunes hosted on HF. Built for iOS app integration with WebSocket streaming support.

Hardware Requirements: Optimal performance requires an L40S GPU (48GB VRAM) for real-time streaming. L4 24GB almost works but will not achieve real-time performance (if someone knows an optimization that will solve this, please let me know).

Quick Start - WebSocket Streaming

Connect to wss://<your-space>/ws/jam for real-time audio generation:

Start Real-time Generation

{
  "type": "start",
  "mode": "rt",
  "binary_audio": false,
  "params": {
    "styles": "electronic, ambient",
    "style_weights": "1.0, 0.8",
    "temperature": 1.1,
    "topk": 40,
    "guidance_weight": 1.1,
    "pace": "realtime",
    "style_ramp_seconds": 8.0,
    "mean": 0.0,
    "centroid_weights": "0.0, 0.0, 0.0"
  }
}

Update Parameters Live

{
  "type": "update",
  "styles": "jazz, hiphop",
  "style_weights": "1.0, 0.8",
  "temperature": 1.2,
  "topk": 64,
  "guidance_weight": 1.0,
  "mean": 0.2,
  "centroid_weights": "0.1, 0.3, 0.0"
}

Stop Generation

{"type": "stop"}

API Endpoints

POST /generate - Generate 4–8 bars of music with input audio

POST /generate_style - Generate music from style prompts only (experimental)

POST /jam/start - Start continuous jamming session

GET /jam/next - Get next audio chunk from session

POST /jam/consume - Mark chunk as consumed

POST /jam/stop - End jamming session

WEBSOCKET /ws/jam - Real-time streaming interface

POST /model/select - Switch between base and fine-tuned models

Custom Fine-Tuning

Train your own MagentaRT models and use them with this API and the iOS app.

1. Train Your Model

Use the official MagentaRT fine-tuning notebook:

🔗 MagentaRT Fine-tuning Colab

This will create checkpoint folders like:

checkpoint_1861001/
checkpoint_1862001/
And steering assets: cluster_centroids.npy, mean_style_embed.npy

2. Package Checkpoints

Checkpoints must be compressed as .tgz files to preserve .zarray files correctly.

Important: Do not download checkpoint folders directly from Google Drive - the .zarray files won't transfer properly.

Checkpoint Packaging Script

Use this in a Colab cell to properly package your checkpoints:

# Mount Drive to access your trained checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Set the path to your checkpoint folder
CKPT_SRC = '/content/drive/MyDrive/thepatch/checkpoint_1862001'  # Adjust path

# Copy folder to local storage (preserves dotfiles)
!rm -rf /content/checkpoint_1862001
!cp -a "$CKPT_SRC" /content/

# Verify .zarray files are present
!find /content/checkpoint_1862001 -name .zarray | wc -l

# Create properly formatted .tgz archive
!tar -C /content -czf /content/checkpoint_1862001.tgz checkpoint_1862001

# Verify critical files are in the archive
!tar -tzf /content/checkpoint_1862001.tgz | grep -c '.zarray'

# Download the .tgz file
from google.colab import files
files.download('/content/checkpoint_1862001.tgz')

3. Upload to Hugging Face

Create a model repository and upload:

Your .tgz checkpoint files
cluster_centroids.npy (for steering)
mean_style_embed.npy (for steering)

Example Repository: thepatch/magenta-ft
Shows the correct file structure with .tgz files and .npy steering assets in the root directory.

4. Use in the App

In the iOS app's model selector, point to your Hugging Face repository URL. The app will automatically discover available checkpoints and allow switching between them.

Technical Specifications

Audio Format: 48 kHz stereo, ~2.0s chunks with ~40ms crossfade
Model Sizes: Base and Large variants available
Steering: Support for text prompts, audio embeddings, and centroid-based fine-tune steering
Real-time Performance: L40S recommended; L4 may experience slight delays
Memory Requirements: ~40GB VRAM for sustained real-time streaming

Note: The /generate_style endpoint is experimental and may not properly adhere to BPM without additional context (considering metronome-based context instead of silence).

Integration with iOS App

This API is designed to work seamlessly with our iOS music generation app:

Real-time audio streaming via WebSockets
Dynamic model switching between base and fine-tuned models
Integration with stable-audio-open-small for combined input audio generation
Live parameter adjustment during generation

Deployment

To run your own instance:

Duplicate this Hugging Face Space
Ensure you have access to an L40S GPU
Point your iOS app to the new space URL (e.g., https://your-username-magenta-retry.hf.space)
Upload your fine-tuned models as described above

Support & Contact

This is an active research project. For questions, technical support, or collaboration:

Email: kev@thecollabagepatch.com

Research Status: This project is under active development. Features and API may change. We welcome feedback and contributions from the research community.

Licensing

Built on Google's MagentaRT (Apache 2.0 + CC-BY 4.0). Users are responsible for their generated outputs and ensuring compliance with applicable laws and platform policies.

📖 API Reference Documentation

🎵 MagentaRT Research API

⚙️ Environment variables (optional, but helpful)

📱 App Demo Video