Wan2.1-T2V-14B-StepDistill-CfgDistill

Overview

Wan2.1-T2V-14B-StepDistill-CfgDistill is an advanced text-to-video generation model built upon the Wan2.1-T2V-14B foundation. This approach allows the model to generate videos with significantly fewer inference steps (4 or 8 steps) and without classifier-free guidance, substantially reducing video generation time while maintaining high quality outputs.

Video Demos

Demos (4steps)

Training

Our training code is modified based on the Self-Forcing repository. We extended support for the Wan2.1-14B-T2V model and performed a 4-step bidirectional distillation process. The modified code is available at Self-Forcing-Plus.

Inference

Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output.

bash scripts/run_wan_t2v_distill.sh

License Agreement

The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license.

Acknowledgements

We would like to thank the contributors to the Wan2.1, Self-Forcing repositories, for their open research.

lightx2v
/

Wan2.1-T2V-14B-StepDistill-CfgDistill