Add link to paper, code, and project page
Browse filesThis PR ensures the model card links to the Github code repository, the technical report, and project page, enabling people to more easily find all the resources.
README.md
CHANGED
@@ -1,16 +1,17 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
- zh
|
|
|
|
|
6 |
pipeline_tag: text-to-video
|
7 |
tags:
|
8 |
- video generation
|
9 |
-
library_name: diffusers
|
10 |
inference:
|
11 |
parameters:
|
12 |
num_inference_steps: 10
|
13 |
---
|
|
|
14 |
# Wan2.1
|
15 |
|
16 |
<p align="center">
|
@@ -18,7 +19,7 @@ inference:
|
|
18 |
<p>
|
19 |
|
20 |
<p align="center">
|
21 |
-
💜 <a href=""><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="">Paper (Coming soon)</a>    |    📑 <a href="https://
|
22 |
<br>
|
23 |
|
24 |
-----
|
@@ -68,13 +69,13 @@ This repository features our T2V-14B model, which establishes a new SOTA perform
|
|
68 |
|
69 |
#### Installation
|
70 |
Clone the repo:
|
71 |
-
```
|
72 |
git clone https://github.com/Wan-Video/Wan2.1.git
|
73 |
cd Wan2.1
|
74 |
```
|
75 |
|
76 |
Install dependencies:
|
77 |
-
```
|
78 |
# Ensure torch >= 2.4.0
|
79 |
pip install -r requirements.txt
|
80 |
```
|
@@ -142,13 +143,13 @@ To facilitate implementation, we will start with a basic version of the inferenc
|
|
142 |
|
143 |
- Single-GPU inference
|
144 |
|
145 |
-
```
|
146 |
python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
147 |
```
|
148 |
|
149 |
If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
|
150 |
|
151 |
-
```
|
152 |
python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
153 |
```
|
154 |
|
@@ -157,7 +158,7 @@ python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B
|
|
157 |
|
158 |
- Multi-GPU inference using FSDP + xDiT USP
|
159 |
|
160 |
-
```
|
161 |
pip install "xfuser>=0.4.1"
|
162 |
torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
163 |
```
|
@@ -172,7 +173,7 @@ Extending the prompts can effectively enrich the details in the generated videos
|
|
172 |
- Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
|
173 |
- Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
|
174 |
- You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
|
175 |
-
```
|
176 |
DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
|
177 |
```
|
178 |
|
@@ -184,13 +185,13 @@ DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_
|
|
184 |
- Larger models generally provide better extension results but require more GPU memory.
|
185 |
- You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
|
186 |
|
187 |
-
```
|
188 |
python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
|
189 |
```
|
190 |
|
191 |
##### (3) Runing local gradio
|
192 |
|
193 |
-
```
|
194 |
cd gradio
|
195 |
# if one uses dashscope’s API for prompt extension
|
196 |
DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
- zh
|
5 |
+
library_name: diffusers
|
6 |
+
license: apache-2.0
|
7 |
pipeline_tag: text-to-video
|
8 |
tags:
|
9 |
- video generation
|
|
|
10 |
inference:
|
11 |
parameters:
|
12 |
num_inference_steps: 10
|
13 |
---
|
14 |
+
|
15 |
# Wan2.1
|
16 |
|
17 |
<p align="center">
|
|
|
19 |
<p>
|
20 |
|
21 |
<p align="center">
|
22 |
+
💜 <a href="https://wan.video"><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="https://huggingface.co/papers/2503.20314">Paper (Coming soon)</a>    |    📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a>    |   💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>   |    📖 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>  
|
23 |
<br>
|
24 |
|
25 |
-----
|
|
|
69 |
|
70 |
#### Installation
|
71 |
Clone the repo:
|
72 |
+
```sh
|
73 |
git clone https://github.com/Wan-Video/Wan2.1.git
|
74 |
cd Wan2.1
|
75 |
```
|
76 |
|
77 |
Install dependencies:
|
78 |
+
```sh
|
79 |
# Ensure torch >= 2.4.0
|
80 |
pip install -r requirements.txt
|
81 |
```
|
|
|
143 |
|
144 |
- Single-GPU inference
|
145 |
|
146 |
+
```sh
|
147 |
python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
148 |
```
|
149 |
|
150 |
If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
|
151 |
|
152 |
+
```sh
|
153 |
python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
154 |
```
|
155 |
|
|
|
158 |
|
159 |
- Multi-GPU inference using FSDP + xDiT USP
|
160 |
|
161 |
+
```sh
|
162 |
pip install "xfuser>=0.4.1"
|
163 |
torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
|
164 |
```
|
|
|
173 |
- Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
|
174 |
- Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
|
175 |
- You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
|
176 |
+
```sh
|
177 |
DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
|
178 |
```
|
179 |
|
|
|
185 |
- Larger models generally provide better extension results but require more GPU memory.
|
186 |
- You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
|
187 |
|
188 |
+
```sh
|
189 |
python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
|
190 |
```
|
191 |
|
192 |
##### (3) Runing local gradio
|
193 |
|
194 |
+
```sh
|
195 |
cd gradio
|
196 |
# if one uses dashscope’s API for prompt extension
|
197 |
DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B
|