metadata

license: apache-2.0

Magi-1: Autoregressive Video Generation Are Scalable World Models

此处添加官方图片

This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our project page.

1. Introduction

We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.

2. Model and Checkpoints

We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.

Model	Link	Recommend Machine
Magi-1-24B	Magi-1-24B	H100/H800 * 8
Magi-1-24B-distill	Magi-1-24B-distill	H100/H800 * 8
Magi-1-24B-distill+fp8_quant	Magi-1-24B-distill+quant	H100/H800 * 4 or RTX 4090 * 8
Magi-1-4.5B	Magi-1-4.5B (Comming Soon)	RTX 4090 * 1
Magi-1-4.5B-distill	Magi-1-4.5B-distill (Comming Soon)	RTX 4090 * 1
Magi-1-4.5B-distill+fp8_quant	Magi-1-4.5B-distill+fp8_quant (Comming Soon)	RTX 4090 * 1

3. How to run

3.1 Environment preparation

We provide two ways to run Magi-1, with the Docker environment being the recommended option.

Run with docker environment (Recommend)

docker pull magi/magi:latest

docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash

Run with source code

# Create a new environment
conda create -n magi python==3.10.12
# Install pytorch
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# Install other dependencies
pip install -r requirements.txt
# Install magi-attention, new install method
pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps

3.2 Inference command

# Run 24B Magi-1 model
bash example/24B/run.sh

# Run 4.5B Magi-1 model
bash example/4.5B/run.sh

3.3 Useful configs

Config	Help
seed	Random seed used for video generation
video_size_h	Height of the video
video_size_w	Width of the video
num_frames	Controls the duration of generated video
fps	Frames per second, 4 video frames correspond to 1 latent_frame
cfg_number	Base model uses cfg_number==2, distill and quant model uses cfg_number=1
load	Directory containing a model checkpoint.
t5_pretrained	Path to load pretrained T5 model
vae_pretrained	Path to load pretrained VAE model

4. Acknowledgements

5. Contact

Please feel free to cite our paper if you find our code or model useful in your research.

If you have any questions, please feel free to raise an issue.