Commit
·
e6aede1
1
Parent(s):
78120a0
init upload
Browse files- .gitattributes +0 -2
- README.md +11 -12
- assets/.DS_Store +0 -3
- assets/comp_effic.png +2 -2
- assets/input.png +0 -3
- assets/vben_vs_sota.png +2 -2
- assets/video_vae_res.jpg +2 -2
.gitattributes
CHANGED
@@ -35,11 +35,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
37 |
xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
38 |
-
assets/.DS_Store filter=lfs diff=lfs merge=lfs -text
|
39 |
assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
|
40 |
assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
|
41 |
assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
|
42 |
-
assets/input.png filter=lfs diff=lfs merge=lfs -text
|
43 |
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
44 |
assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
|
45 |
assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text
|
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
google/umt5-xxl/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
37 |
xlm-roberta-large/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
38 |
assets/comp_effic.png filter=lfs diff=lfs merge=lfs -text
|
39 |
assets/data_for_diff_stage.jpg filter=lfs diff=lfs merge=lfs -text
|
40 |
assets/i2v_res.png filter=lfs diff=lfs merge=lfs -text
|
|
|
41 |
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
42 |
assets/t2v_res.jpg filter=lfs diff=lfs merge=lfs -text
|
43 |
assets/vben_1.3b_vs_sota.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -5,12 +5,12 @@
|
|
5 |
<p>
|
6 |
|
7 |
<p align="center">
|
8 |
-
💜 <a href=""><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="">Paper</a>    |    📑 <a href="">Blog</a>    |   💬 <a href="">WeChat
|
9 |
<br>
|
10 |
|
11 |
-----
|
12 |
|
13 |
-
[**Wan: Open and Advanced Large-Scale Video Generative Models**](
|
14 |
|
15 |
In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
|
16 |
- 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
|
@@ -72,10 +72,10 @@ pip install -r requirements.txt
|
|
72 |
|
73 |
| Models | Download Link | Notes |
|
74 |
| --------------|-------------------------------------------------------------------------------|-------------------------------|
|
75 |
-
| T2V-14B | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)
|
76 |
-
| I2V-14B-720P | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P)
|
77 |
-
| I2V-14B-480P | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
|
78 |
-
| T2V-1.3B | [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
|
79 |
|
80 |
> 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
|
81 |
|
@@ -83,7 +83,7 @@ pip install -r requirements.txt
|
|
83 |
Download models using huggingface-cli:
|
84 |
```
|
85 |
pip install "huggingface_hub[cli]"
|
86 |
-
huggingface-cli download
|
87 |
```
|
88 |
|
89 |
|
@@ -126,6 +126,7 @@ Similar to Text-to-Video, Image-to-Video is also divided into processes with and
|
|
126 |
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
|
127 |
```
|
128 |
|
|
|
129 |
|
130 |
- Multi-GPU inference using FSDP + xDiT USP
|
131 |
|
@@ -137,8 +138,6 @@ torchrun --nproc_per_node=8 generate.py --task i2v-14B --size 1280*720 --ckpt_di
|
|
137 |
##### (2) Using Prompt Extention
|
138 |
|
139 |
|
140 |
-
The process of prompt extension can be referenced [here](#2-using-prompt-extention).
|
141 |
-
|
142 |
Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
|
143 |
```
|
144 |
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
|
@@ -228,7 +227,7 @@ We curated and deduplicated a candidate dataset comprising a vast amount of imag
|
|
228 |
|
229 |
|
230 |
##### Comparisons to SOTA
|
231 |
-
We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions.
|
232 |
|
233 |
data:image/s3,"s3://crabby-images/b2513/b251308f29279c88f75ec08a1f04861fcdfae1e5" alt="figure1"
|
234 |
|
@@ -251,9 +250,9 @@ The models in this repository are licensed under the Apache 2.0 License. We clai
|
|
251 |
|
252 |
## Acknowledgements
|
253 |
|
254 |
-
We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [
|
255 |
|
256 |
|
257 |
|
258 |
## Contact Us
|
259 |
-
If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups]()!
|
|
|
5 |
<p>
|
6 |
|
7 |
<p align="center">
|
8 |
+
💜 <a href=""><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="">Paper (Coming soon)</a>    |    📑 <a href="https://wanxai.com">Blog</a>    |   💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>   |    📖 <a href="https://discord.gg/p5XbdQV7">Discord</a>  
|
9 |
<br>
|
10 |
|
11 |
-----
|
12 |
|
13 |
+
[**Wan: Open and Advanced Large-Scale Video Generative Models**]() <be>
|
14 |
|
15 |
In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
|
16 |
- 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
|
|
|
72 |
|
73 |
| Models | Download Link | Notes |
|
74 |
| --------------|-------------------------------------------------------------------------------|-------------------------------|
|
75 |
+
| T2V-14B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B) | Supports both 480P and 720P
|
76 |
+
| I2V-14B-720P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) | Supports 720P
|
77 |
+
| I2V-14B-480P | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P) | Supports 480P
|
78 |
+
| T2V-1.3B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) 🤖 [ModelScope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) | Supports 480P
|
79 |
|
80 |
> 💡Note: The 1.3B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
|
81 |
|
|
|
83 |
Download models using huggingface-cli:
|
84 |
```
|
85 |
pip install "huggingface_hub[cli]"
|
86 |
+
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir ./Wan2.1-I2V-14B-720P
|
87 |
```
|
88 |
|
89 |
|
|
|
126 |
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
|
127 |
```
|
128 |
|
129 |
+
> 💡For the Image-to-Video task, the `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.
|
130 |
|
131 |
- Multi-GPU inference using FSDP + xDiT USP
|
132 |
|
|
|
138 |
##### (2) Using Prompt Extention
|
139 |
|
140 |
|
|
|
|
|
141 |
Run with local prompt extention using `Qwen/Qwen2.5-VL-7B-Instruct`:
|
142 |
```
|
143 |
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir ./Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --use_prompt_extend --prompt_extend_model Qwen/Qwen2.5-VL-7B-Instruct --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
|
|
|
227 |
|
228 |
|
229 |
##### Comparisons to SOTA
|
230 |
+
We compared **Wan2.1** with leading open-source and closed-source models to evaluate the performace. Using our carefully designed set of 1,035 internal prompts, we tested across 14 major dimensions and 26 sub-dimensions. We then compute the total score by performing a weighted calculation on the scores of each dimension, utilizing weights derived from human preferences in the matching process. The detailed results are shown in the table below. These results demonstrate our model's superior performance compared to both open-source and closed-source models.
|
231 |
|
232 |
data:image/s3,"s3://crabby-images/b2513/b251308f29279c88f75ec08a1f04861fcdfae1e5" alt="figure1"
|
233 |
|
|
|
250 |
|
251 |
## Acknowledgements
|
252 |
|
253 |
+
We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research.
|
254 |
|
255 |
|
256 |
|
257 |
## Contact Us
|
258 |
+
If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/p5XbdQV7) or [WeChat groups](https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg)!
|
assets/.DS_Store
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:d65165279105ca6773180500688df4bdc69a2c7b771752f0a46ef120b7fd8ec3
|
3 |
-
size 6148
|
|
|
|
|
|
|
|
assets/comp_effic.png
CHANGED
![]() |
Git LFS Details
|
![]() |
Git LFS Details
|
assets/input.png
DELETED
Git LFS Details
|
assets/vben_vs_sota.png
CHANGED
![]() |
Git LFS Details
|
![]() |
Git LFS Details
|
assets/video_vae_res.jpg
CHANGED
![]() |
Git LFS Details
|
![]() |
Git LFS Details
|