xiguan97 commited on
Commit
2300e78
·
verified ·
1 Parent(s): 2c1acc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -34
README.md CHANGED
@@ -2,77 +2,191 @@
2
  license: apache-2.0
3
  ---
4
 
 
5
 
6
- # Magi-1: Autoregressive Video Generation Are Scalable World Models
7
 
8
- <!-- TODO: add image -->
9
- <div align="center" style="margin-top: 0px; margin-bottom: 0px;">
10
- <img src=https://github.com/user-attachments/.... width="30%"/>
11
- 此处添加官方图片
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  </div>
13
 
14
- -----
15
 
16
- This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our [project page](http://sand.ai).
 
 
 
 
 
 
 
 
 
17
 
 
18
 
19
- ## 1. Introduction
20
 
21
- We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.
 
 
 
 
 
 
 
22
 
23
 
24
- ## 2. Model and Checkpoints
25
 
26
- We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
27
 
28
  | Model | Link | Recommend Machine |
29
  | ----------------------------- | ------------------------------------------------------------ | ------------------------------- |
30
- | Magi-1-24B | [Magi-1-24B](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_base) | H100/H800 \* 8 |
31
- | Magi-1-24B-distill | [Magi-1-24B-distill](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill) | H100/H800 \* 8 |
32
- | Magi-1-24B-distill+fp8_quant | [Magi-1-24B-distill+quant](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 \* 4 or RTX 4090 \* 8 |
33
- | Magi-1-4.5B | Magi-1-4.5B (Comming Soon) | RTX 4090 \* 1 |
34
- | Magi-1-4.5B-distill | Magi-1-4.5B-distill (Comming Soon) | RTX 4090 \* 1 |
35
- | Magi-1-4.5B-distill+fp8_quant | Magi-1-4.5B-distill+fp8_quant (Comming Soon) | RTX 4090 \* 1 |
 
 
 
 
36
 
 
37
 
38
- ## 3. How to run
39
 
40
- ### 3.1 Environment preparation
41
 
42
- We provide two ways to run Magi-1, with the Docker environment being the recommended option.
43
 
44
- **Run with docker environment (Recommend)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ```bash
47
- docker pull magi/magi:latest
48
 
49
  docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
50
  ```
51
 
52
- **Run with source code**
53
 
54
  ```bash
55
  # Create a new environment
56
  conda create -n magi python==3.10.12
 
57
  # Install pytorch
58
  conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
 
59
  # Install other dependencies
60
  pip install -r requirements.txt
61
- # Install magi-attention, new install method
62
- pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps
 
 
 
 
 
 
 
63
  ```
64
 
65
- ### 3.2 Inference command
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ```bash
68
- # Run 24B Magi-1 model
 
69
  bash example/24B/run.sh
70
 
71
- # Run 4.5B Magi-1 model
72
  bash example/4.5B/run.sh
73
  ```
74
 
75
- ### 3.3 Useful configs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  | Config | Help |
78
  | -------------- | ------------------------------------------------------------ |
@@ -87,13 +201,23 @@ bash example/4.5B/run.sh
87
  | vae_pretrained | Path to load pretrained VAE model |
88
 
89
 
90
- ## 4. Acknowledgements
91
 
92
- ## 5. Contact
93
 
94
- Please feel free to cite our paper if you find our code or model useful in your research.
95
 
 
 
 
 
 
 
 
 
 
96
  ```
97
- ```
98
 
99
- If you have any questions, please feel free to raise an issue.
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ ![magi-logo](figures/logo_black.png)
6
 
 
7
 
8
+ -----
9
+
10
+ <p align="center">
11
+ <a href="https://static.magi.world/static/files/MAGI_1.pdf"><img alt="paper" src="https://img.shields.io/badge/Paper-arXiv-B31B1B?logo=arxiv"></a>
12
+ <a href="https://sand.ai"><img alt="blog" src="https://img.shields.io/badge/Sand%20AI-Homepage-333333.svg?logo="></a>
13
+ <a href="https://magi.sand.ai"><img alt="product" src="https://img.shields.io/badge/Magi-Product-logo.svg?logo=&color=DCBE7E"></a>
14
+ <a href="https://huggingface.co/sand-ai"><img alt="Hugging Face"
15
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Sand AI-ffc107?color=ffc107&logoColor=white"/></a>
16
+ <a href="https://x.com/SandAI_HQ"><img alt="Twitter Follow"
17
+ src="https://img.shields.io/badge/Twitter-Sand%20AI-white?logo=x&logoColor=white"/></a>
18
+ <a href="https://discord.gg/hgaZ86D7Wv"><img alt="Discord"
19
+ src="https://img.shields.io/badge/Discord-Sand%20AI-7289da?logo=discord&logoColor=white&color=7289da"/></a>
20
+ <a href="https://github.com/SandAI-org/Magi/LICENSE"><img alt="license" src="https://img.shields.io/badge/License-Apache2.0-green?logo=Apache"></a>
21
+ </p>
22
+
23
+ # MAGI-1: Autoregressive Video Generation at Scale
24
+
25
+ This repository contains the code for the MAGI-1 model, pre-trained weights and inference code. You can find more information on our [technical report](https://static.magi.world/static/files/MAGI_1.pdf) or directly create magic with MAGI-1 [here](http://sand.ai) . 🚀✨
26
+
27
+
28
+ ## 🔥🔥🔥 Latest News
29
+
30
+ - Apr 21, 2025: MAGI-1 is here 🎉. We've released the model weights and inference code — check it out!
31
+
32
+
33
+ ## 1. About
34
+
35
+ We present MAGI-1, a world model that generates videos by ***autoregressively*** predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe MAGI-1 offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.
36
+
37
+ <div align="center">
38
+ <video src="https://github.com/user-attachments/assets/5cfa90e0-f6ed-476b-a194-71f1d309903a
39
+ " width="70%" poster=""> </video>
40
  </div>
41
 
 
42
 
43
+ ## 2. Model Summary
44
+
45
+ ### Transformer-based VAE
46
+
47
+ - Variational autoencoder (VAE) with transformer-based architecture, 8x spatial and 4x temporal compression.
48
+ - Fastest average decoding time and highly competitive reconstruction quality
49
+
50
+ ### Auto-Regressive Denoising Algorithm
51
+
52
+ MAGI-1 is an autoregressive denoising video generation model generating videos chunk-by-chunk instead of as a whole. Each chunk (24 frames) is denoised holistically, and the generation of the next chunk begins as soon as the current one reaches a certain level of denoising. This pipeline design enables concurrent processing of up to four chunks for efficient video generation.
53
 
54
+ ![auto-regressive denosing algorithm](figures/algorithm.png)
55
 
56
+ ### Diffusion Model Architecture
57
 
58
+ MAGI-1 is built upon the Diffusion Transformer, incorporating several key innovations to enhance training efficiency and stability at scale. These advancements include Block-Causal Attention, Parallel Attention Block, QK-Norm and GQA, Sandwich Normalization in FFN, SwiGLU, and Softcap Modulation. For more details, please refer to the [technical report.](https://static.magi.world/static/files/MAGI_1.pdf)
59
+ <div align="center">
60
+ <img src="figures/dit_architecture.png" alt="diffusion model architecture" width="500" />
61
+ </div>
62
+
63
+ ### Distillation Algorithm
64
+
65
+ We adopt a shortcut distillation approach that trains a single velocity-based model to support variable inference budgets. By enforcing a self-consistency constraint—equating one large step with two smaller steps—the model learns to approximate flow-matching trajectories across multiple step sizes. During training, step sizes are cyclically sampled from {64, 32, 16, 8}, and classifier-free guidance distillation is incorporated to preserve conditional alignment. This enables efficient inference with minimal loss in fidelity.
66
 
67
 
68
+ ## 3. Model Zoo
69
 
70
+ We provide the pre-trained weights for MAGI-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
71
 
72
  | Model | Link | Recommend Machine |
73
  | ----------------------------- | ------------------------------------------------------------ | ------------------------------- |
74
+ | T5 | [T5](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/t5) | - |
75
+ | MAGI-1-VAE | [MAGI-1-VAE](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/vae) | - |
76
+ | MAGI-1-24B | [MAGI-1-24B](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_base) | H100/H800 \* 8 |
77
+ | MAGI-1-24B-distill | [MAGI-1-24B-distill](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill) | H100/H800 \* 8 |
78
+ | MAGI-1-24B-distill+fp8_quant | [MAGI-1-24B-distill+quant](https://huggingface.co/sand-ai/MAGI-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 \* 4 or RTX 4090 \* 8 |
79
+ | MAGI-1-4.5B | MAGI-1-4.5B | RTX 4090 \* 1 |
80
+
81
+ ## 4. Evaluation
82
+
83
+ ### In-house Human Evaluation
84
 
85
+ MAGI-1 achieves state-of-the-art performance among open-source models (surpassing Wan-2.1 and significantly outperforming Hailuo and HunyuanVideo), particularly excelling in instruction following and motion quality, positioning it as a strong potential competitor to closed-source commercial models such as Kling.
86
 
87
+ ![inhouse human evaluation](figures/inhouse_human_evaluation.png)
88
 
89
+ ### Physical Evaluation
90
 
91
+ Thanks to the natural advantages of autoregressive architecture, Magi achieves far superior precision in predicting physical behavior through video continuation—significantly outperforming all existing models.
92
 
93
+ | Model | Phys. IQ Score ↑ | Spatial IoU ↑ | Spatio Temporal ↑ | Weighted Spatial IoU ↑ | MSE ↓ |
94
+ |----------------|------------------|---------------|-------------------|-------------------------|--------|
95
+ | **V2V Models** | | | | | |
96
+ | **Magi (V2V)** | **56.02** | **0.367** | **0.270** | **0.304** | **0.005** |
97
+ | VideoPoet (V2V)| 29.50 | 0.204 | 0.164 | 0.137 | 0.010 |
98
+ | **I2V Models** | | | | | |
99
+ | **Magi (I2V)** | **30.23** | **0.203** | **0.151** | **0.154** | **0.012** |
100
+ | Kling1.6 (I2V) | 23.64 | 0.197 | 0.086 | 0.144 | 0.025 |
101
+ | VideoPoet (I2V)| 20.30 | 0.141 | 0.126 | 0.087 | 0.012 |
102
+ | Gen 3 (I2V) | 22.80 | 0.201 | 0.115 | 0.116 | 0.015 |
103
+ | Wan2.1 (I2V) | 20.89 | 0.153 | 0.100 | 0.112 | 0.023 |
104
+ | Sora (I2V) | 10.00 | 0.138 | 0.047 | 0.063 | 0.030 |
105
+ | **GroundTruth**| **100.0** | **0.678** | **0.535** | **0.577** | **0.002** |
106
+
107
+
108
+ ## 5. How to run
109
+
110
+ ### Environment Preparation
111
+
112
+ We provide two ways to run MAGI-1, with the Docker environment being the recommended option.
113
+
114
+ **Run with Docker Environment (Recommend)**
115
 
116
  ```bash
117
+ docker pull sandai/magi:latest
118
 
119
  docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
120
  ```
121
 
122
+ **Run with Source Code**
123
 
124
  ```bash
125
  # Create a new environment
126
  conda create -n magi python==3.10.12
127
+
128
  # Install pytorch
129
  conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
130
+
131
  # Install other dependencies
132
  pip install -r requirements.txt
133
+
134
+ # Install ffmpeg
135
+ conda install -c conda-forge ffmpeg=4.4
136
+
137
+ # Install MagiAttention, for more information, please refer to https://github.com/SandAI-org/MagiAttention#
138
+ git clone [email protected]:SandAI-org/MagiAttention.git
139
+ cd MagiAttention
140
+ git submodule update --init --recursive
141
+ pip install --no-build-isolation .
142
  ```
143
 
144
+ ### Inference Command
145
+
146
+ To run the `MagiPipeline`, you can control the input and output by modifying the parameters in the `example/24B/run.sh` or `example/4.5B/run.sh` script. Below is an explanation of the key parameters:
147
+
148
+ #### Parameter Descriptions
149
+
150
+ - `--config_file`: Specifies the path to the configuration file, which contains model configuration parameters, e.g., `example/24B/24B_config.json`.
151
+ - `--mode`: Specifies the mode of operation. Available options are:
152
+ - `t2v`: Text to Video
153
+ - `i2v`: Image to Video
154
+ - `v2v`: Video to Video
155
+ - `--prompt`: The text prompt used for video generation, e.g., `"Good Boy"`.
156
+ - `--image_path`: Path to the image file, used only in `i2v` mode.
157
+ - `--prefix_video_path`: Path to the prefix video file, used only in `v2v` mode.
158
+ - `--output_path`: Path where the generated video file will be saved.
159
+
160
+ #### Bash Script
161
 
162
  ```bash
163
+ #!/bin/bash
164
+ # Run 24B MAGI-1 model
165
  bash example/24B/run.sh
166
 
167
+ # Run 4.5B MAGI-1 model
168
  bash example/4.5B/run.sh
169
  ```
170
 
171
+ #### Customizing Parameters
172
+
173
+ You can modify the parameters in `run.sh` as needed. For example:
174
+
175
+ - To use the Image to Video mode (`i2v`), set `--mode` to `i2v` and provide `--image_path`:
176
+ ```bash
177
+ --mode i2v \
178
+ --image_path example/assets/image.jpeg \
179
+ ```
180
+
181
+ - To use the Video to Video mode (`v2v`), set `--mode` to `v2v` and provide `--prefix_video_path`:
182
+ ```bash
183
+ --mode v2v \
184
+ --prefix_video_path example/assets/prefix_video.mp4 \
185
+ ```
186
+
187
+ By adjusting these parameters, you can flexibly control the input and output to meet different requirements.
188
+
189
+ ### Some Useful Configs (for config.json)
190
 
191
  | Config | Help |
192
  | -------------- | ------------------------------------------------------------ |
 
201
  | vae_pretrained | Path to load pretrained VAE model |
202
 
203
 
204
+ ## 6. License
205
 
206
+ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
207
 
208
+ ## 7. Citation
209
 
210
+ If you find our code or model useful in your research, please cite:
211
+
212
+ ```bibtex
213
+ @misc{magi1,
214
+ title={MAGI-1: Autoregressive Video Generation at Scale},
215
+ author={Sand-AI},
216
+ year={2025},
217
+ url={https://static.magi.world/static/files/MAGI_1.pdf},
218
+ }
219
  ```
 
220
 
221
+ ## 8. Contact
222
+
223
+ If you have any questions, please feel free to raise an issue or contact us at [[email protected]]([email protected]) .