MAGI-1 / README.md
xiguan97's picture
Update README.md
2c1acc6 verified
|
raw
history blame
4.82 kB
metadata
license: apache-2.0

Magi-1: Autoregressive Video Generation Are Scalable World Models

此处添加官方图片

This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our project page.

1. Introduction

We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.

2. Model and Checkpoints

We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.

Model Link Recommend Machine
Magi-1-24B Magi-1-24B H100/H800 * 8
Magi-1-24B-distill Magi-1-24B-distill H100/H800 * 8
Magi-1-24B-distill+fp8_quant Magi-1-24B-distill+quant H100/H800 * 4 or RTX 4090 * 8
Magi-1-4.5B Magi-1-4.5B (Comming Soon) RTX 4090 * 1
Magi-1-4.5B-distill Magi-1-4.5B-distill (Comming Soon) RTX 4090 * 1
Magi-1-4.5B-distill+fp8_quant Magi-1-4.5B-distill+fp8_quant (Comming Soon) RTX 4090 * 1

3. How to run

3.1 Environment preparation

We provide two ways to run Magi-1, with the Docker environment being the recommended option.

Run with docker environment (Recommend)

docker pull magi/magi:latest

docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash

Run with source code

# Create a new environment
conda create -n magi python==3.10.12
# Install pytorch
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# Install other dependencies
pip install -r requirements.txt
# Install magi-attention, new install method
pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps

3.2 Inference command

# Run 24B Magi-1 model
bash example/24B/run.sh

# Run 4.5B Magi-1 model
bash example/4.5B/run.sh

3.3 Useful configs

Config Help
seed Random seed used for video generation
video_size_h Height of the video
video_size_w Width of the video
num_frames Controls the duration of generated video
fps Frames per second, 4 video frames correspond to 1 latent_frame
cfg_number Base model uses cfg_number==2, distill and quant model uses cfg_number=1
load Directory containing a model checkpoint.
t5_pretrained Path to load pretrained T5 model
vae_pretrained Path to load pretrained VAE model

4. Acknowledgements

5. Contact

Please feel free to cite our paper if you find our code or model useful in your research.


If you have any questions, please feel free to raise an issue.