| ## What this is | |
| We're exploring AI‑assisted loop‑based music creation that can run on GPUs (not just TPUs) and stream to apps in realtime. | |
| ### Implemented backends | |
| - **HTTP (bar‑aligned):** `/generate`, `/jam/start`, `/jam/next`, `/jam/stop`, `/jam/update`, etc. | |
| - **WebSocket (realtime):** `ws://…/ws/jam` with `mode="rt"` (Colab‑style continuous chunks). New in this build. | |
| ## What we learned (GPU notes) | |
| - **L40S 48GB:** comfortably **faster than realtime** → we added a `pace: "realtime"` switch so the server doesn't outrun playback. | |
| - **L4 24GB:** **consistently just under realtime**; even with pre‑roll buffering, TF32/JAX tunings, reduced chunk size, and the **base** checkpoint, we still see eventual under‑runs. | |
| - **Implication:** For production‑quality realtime, aim for ~**40GB VRAM** per user/session (e.g., **A100 40GB**, or MIG slices ≈ **35–40GB** on newer parts). Smaller GPUs can demo, but sustained realtime is not reliable. | |
| ## Model / audio specs | |
| - **Model:** MagentaRT (T5X; decoder RVQ depth = 16) | |
| - **Audio:** 48 kHz stereo, 2.0 s chunks by default, 40 ms crossfade | |
| - **Context:** 10 s rolling context window |