Add link to paper, code, and project page

#32
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -1,16 +1,17 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
 
6
  pipeline_tag: text-to-video
7
  tags:
8
  - video generation
9
- library_name: diffusers
10
  inference:
11
  parameters:
12
  num_inference_steps: 10
13
  ---
 
14
  # Wan2.1
15
 
16
  <p align="center">
@@ -18,7 +19,7 @@ inference:
18
  <p>
19
 
20
  <p align="center">
21
- 💜 <a href=""><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wanxai.com">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>&nbsp&nbsp
22
  <br>
23
 
24
  -----
@@ -68,13 +69,13 @@ This repository features our T2V-14B model, which establishes a new SOTA perform
68
 
69
  #### Installation
70
  Clone the repo:
71
- ```
72
  git clone https://github.com/Wan-Video/Wan2.1.git
73
  cd Wan2.1
74
  ```
75
 
76
  Install dependencies:
77
- ```
78
  # Ensure torch >= 2.4.0
79
  pip install -r requirements.txt
80
  ```
@@ -142,13 +143,13 @@ To facilitate implementation, we will start with a basic version of the inferenc
142
 
143
  - Single-GPU inference
144
 
145
- ```
146
  python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
147
  ```
148
 
149
  If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
150
 
151
- ```
152
  python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
153
  ```
154
 
@@ -157,7 +158,7 @@ python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B
157
 
158
  - Multi-GPU inference using FSDP + xDiT USP
159
 
160
- ```
161
  pip install "xfuser>=0.4.1"
162
  torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
163
  ```
@@ -172,7 +173,7 @@ Extending the prompts can effectively enrich the details in the generated videos
172
  - Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
173
  - Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
174
  - You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
175
- ```
176
  DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
177
  ```
178
 
@@ -184,13 +185,13 @@ DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_
184
  - Larger models generally provide better extension results but require more GPU memory.
185
  - You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
186
 
187
- ```
188
  python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
189
  ```
190
 
191
  ##### (3) Runing local gradio
192
 
193
- ```
194
  cd gradio
195
  # if one uses dashscope’s API for prompt extension
196
  DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ library_name: diffusers
6
+ license: apache-2.0
7
  pipeline_tag: text-to-video
8
  tags:
9
  - video generation
 
10
  inference:
11
  parameters:
12
  num_inference_steps: 10
13
  ---
14
+
15
  # Wan2.1
16
 
17
  <p align="center">
 
19
  <p>
20
 
21
  <p align="center">
22
+ 💜 <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://huggingface.co/papers/2503.20314">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp
23
  <br>
24
 
25
  -----
 
69
 
70
  #### Installation
71
  Clone the repo:
72
+ ```sh
73
  git clone https://github.com/Wan-Video/Wan2.1.git
74
  cd Wan2.1
75
  ```
76
 
77
  Install dependencies:
78
+ ```sh
79
  # Ensure torch >= 2.4.0
80
  pip install -r requirements.txt
81
  ```
 
143
 
144
  - Single-GPU inference
145
 
146
+ ```sh
147
  python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
148
  ```
149
 
150
  If you encounter OOM (Out-of-Memory) issues, you can use the `--offload_model True` and `--t5_cpu` options to reduce GPU memory usage. For example, on an RTX 4090 GPU:
151
 
152
+ ```sh
153
  python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
154
  ```
155
 
 
158
 
159
  - Multi-GPU inference using FSDP + xDiT USP
160
 
161
+ ```sh
162
  pip install "xfuser>=0.4.1"
163
  torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
164
  ```
 
173
  - Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
174
  - Use the `qwen-plus` model for text-to-video tasks and `qwen-vl-max` for image-to-video tasks.
175
  - You can modify the model used for extension with the parameter `--prompt_extend_model`. For example:
176
+ ```sh
177
  DASH_API_KEY=your_key python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --prompt_extend_target_lang 'ch'
178
  ```
179
 
 
185
  - Larger models generally provide better extension results but require more GPU memory.
186
  - You can modify the model used for extension with the parameter `--prompt_extend_model` , allowing you to specify either a local model path or a Hugging Face model. For example:
187
 
188
+ ```sh
189
  python generate.py --task t2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch'
190
  ```
191
 
192
  ##### (3) Runing local gradio
193
 
194
+ ```sh
195
  cd gradio
196
  # if one uses dashscope’s API for prompt extension
197
  DASH_API_KEY=your_key python t2v_14B_singleGPU.py --prompt_extend_method 'dashscope' --ckpt_dir ./Wan2.1-T2V-14B