Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
π Overview
Matrix-Game-2.0οΌ1.8BοΌ is an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion
β¨ Key Features
- π Feature 1: Real-Time Distillation Efficient ββfew-step diffusionββ for streaming video synthesis at ββ25 FPSββ, producing ββminute-level, high-fidelity videosββ across complex environments with ultra-fast speed.
- π±οΈ Feature 2: Precise Action Injection A ββmouse/keyboard-to-frameββ module that embeds user inputs as direct interactions, enabling frame-level control and dynamic response in generated videos.
- π¬ Feature 3: Massive Interactive Data Pipeline A scalable production system for ββUnreal Engine & GTA5ββ that generates ββ~1200 hoursββ of high-quality interactive video data, covering diverse scenes with frame-level realism.
π₯ Latest Updates
- [2025-08] π Initial release of Matrix-Game-2.0 Model
Model Overview
Matrix-Game-2.0οΌ1.8BοΌ is derived from the Wan. By removing the text branch and adding action modules, the model predicts next frames only from visual contents and corresponding actions.
π Performance Comparison
GameWorld Score Benchmark Comparison
Model | Image Quality β | Aesthetic Quality β | Temporal Cons. β | Motion Smooth. β | Keyboard Acc. β | Mouse Acc. β | Object Cons. | Scenario Cons. |
---|---|---|---|---|---|---|---|---|
Oasis | 0.27 | 0.27 | 0.82 | 0.99 | 0.73 | 0.56 | 0.18 | 0.84 |
Ours | 0.61 | 0.50 | 0.94 | 0.98 | 0.91 | 0.95 | 0.64 | 0.80 |
Metric Descriptions:
Image Quality / Aesthetic: Visual fidelity and perceptual appeal of generated frames
Temporal Consistency / Motion Smoothness: Temporal coherence and smoothness between frames
Keyboard Accuracy / Mouse Accuracy: Accuracy in following user control signals
Object Consistency: Geometric stability and consistency of objects over time
Scenario Consistency: Scenario consistency over time
Please check our GameWorld benchmark for detailed implementation.
π Quick Start
# clone the repository:
git clone https://github.com/SkyworkAI/Matrix-Game.git
cd Matrix-Game/Matrix-Game-2
# install apex and FlashAttention
# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
# install dependencies:
pip install -r requirements.txt
python setup.py develop
# inference
python inference.py \
--config_path configs/inference_yaml/{your-config}.yaml \
--checkpoint_path {path-to-the-checkpoint} \
--img_path {path-to-the-input-image} \
--output_folder outputs \
--num_output_frames 150 \
--seed 42 \
--pretrained_model_path {path-to-the-vae-folder}
# inference streaming
python inference_streaming.py \
--config_path configs/inference_yaml/{your-config}.yaml \
--checkpoint_path {path-to-the-checkpoint} \
--output_folder outputs \
--seed 42 \
--pretrained_model_path {path-to-the-vae-folder}
β Acknowledgements
We would like to express our gratitude to:
- Diffusers for their excellent diffusion model framework
- SkyReels-V2 for their strong base model
- Self-Forcing for their excellent work
- MineRL for their excellent gym framework
- Video-Pre-Training for their accurate Inverse Dynamics Model
- GameFactory for their idea of action control module
We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
π Citation
If you find this project useful, please cite our paper:
- Downloads last month
- -
Model tree for Skywork/Matrix-Game-2.0
Base model
Skywork/SkyReels-V2-I2V-1.3B-540P