license: apache-2.0
Magi-1: Autoregressive Video Generation Are Scalable World Models
This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our project page.
1. Introduction
We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.
2. Model and Checkpoints
We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
Model | Link | Recommend Machine |
---|---|---|
Magi-1-24B | Magi-1-24B | H100/H800 * 8 |
Magi-1-24B-distill | Magi-1-24B-distill | H100/H800 * 8 |
Magi-1-24B-distill+fp8_quant | Magi-1-24B-distill+quant | H100/H800 * 4 or RTX 4090 * 8 |
Magi-1-4.5B | Magi-1-4.5B (Comming Soon) | RTX 4090 * 1 |
Magi-1-4.5B-distill | Magi-1-4.5B-distill (Comming Soon) | RTX 4090 * 1 |
Magi-1-4.5B-distill+fp8_quant | Magi-1-4.5B-distill+fp8_quant (Comming Soon) | RTX 4090 * 1 |
3. How to run
3.1 Environment preparation
We provide two ways to run Magi-1, with the Docker environment being the recommended option.
Run with docker environment (Recommend)
docker pull magi/magi:latest
docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
Run with source code
# Create a new environment
conda create -n magi python==3.10.12
# Install pytorch
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# Install other dependencies
pip install -r requirements.txt
# Install magi-attention, new install method
pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps
3.2 Inference command
# Run 24B Magi-1 model
bash example/24B/run.sh
# Run 4.5B Magi-1 model
bash example/4.5B/run.sh
3.3 Useful configs
Config | Help |
---|---|
seed | Random seed used for video generation |
video_size_h | Height of the video |
video_size_w | Width of the video |
num_frames | Controls the duration of generated video |
fps | Frames per second, 4 video frames correspond to 1 latent_frame |
cfg_number | Base model uses cfg_number==2, distill and quant model uses cfg_number=1 |
load | Directory containing a model checkpoint. |
t5_pretrained | Path to load pretrained T5 model |
vae_pretrained | Path to load pretrained VAE model |
4. Acknowledgements
5. Contact
Please feel free to cite our paper if you find our code or model useful in your research.
If you have any questions, please feel free to raise an issue.