MAGI-1 / README.md

Update README.md

2c1acc6 verified 2 months ago

4.82 kB

	---
	license: apache-2.0
	---


	# Magi-1: Autoregressive Video Generation Are Scalable World Models

	<!-- TODO: add image -->
	<div align="center" style="margin-top: 0px; margin-bottom: 0px;">
	<img src=https://github.com/user-attachments/.... width="30%"/>
	此处添加官方图片
	</div>

	-----

	This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our [project page](http://sand.ai).


	## 1. Introduction

	We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.


	## 2. Model and Checkpoints

	We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.

	\| Model \| Link \| Recommend Machine \|
	\| ----------------------------- \| ------------------------------------------------------------ \| ------------------------------- \|
	\| Magi-1-24B \| [Magi-1-24B](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_base) \| H100/H800 \* 8 \|
	\| Magi-1-24B-distill \| [Magi-1-24B-distill](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill) \| H100/H800 \* 8 \|
	\| Magi-1-24B-distill+fp8_quant \| [Magi-1-24B-distill+quant](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill_quant) \| H100/H800 \* 4 or RTX 4090 \* 8 \|
	\| Magi-1-4.5B \| Magi-1-4.5B (Comming Soon) \| RTX 4090 \* 1 \|
	\| Magi-1-4.5B-distill \| Magi-1-4.5B-distill (Comming Soon) \| RTX 4090 \* 1 \|
	\| Magi-1-4.5B-distill+fp8_quant \| Magi-1-4.5B-distill+fp8_quant (Comming Soon) \| RTX 4090 \* 1 \|


	## 3. How to run

	### 3.1 Environment preparation

	We provide two ways to run Magi-1, with the Docker environment being the recommended option.

	Run with docker environment (Recommend)

	```bash
	docker pull magi/magi:latest

	docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
	```

	Run with source code

	```bash
	# Create a new environment
	conda create -n magi python==3.10.12
	# Install pytorch
	conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
	# Install other dependencies
	pip install -r requirements.txt
	# Install magi-attention, new install method
	pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps
	```

	### 3.2 Inference command

	```bash
	# Run 24B Magi-1 model
	bash example/24B/run.sh

	# Run 4.5B Magi-1 model
	bash example/4.5B/run.sh
	```

	### 3.3 Useful configs

	\| Config \| Help \|
	\| -------------- \| ------------------------------------------------------------ \|
	\| seed \| Random seed used for video generation \|
	\| video_size_h \| Height of the video \|
	\| video_size_w \| Width of the video \|
	\| num_frames \| Controls the duration of generated video \|
	\| fps \| Frames per second, 4 video frames correspond to 1 latent_frame \|
	\| cfg_number \| Base model uses cfg_number==2, distill and quant model uses cfg_number=1 \|
	\| load \| Directory containing a model checkpoint. \|
	\| t5_pretrained \| Path to load pretrained T5 model \|
	\| vae_pretrained \| Path to load pretrained VAE model \|


	## 4. Acknowledgements

	## 5. Contact

	Please feel free to cite our paper if you find our code or model useful in your research.

	```
	```

	If you have any questions, please feel free to raise an issue.