alibaba-pai
/

EasyAnimateV5.1-12b-zh-Control-Camera

Diffusers

Safetensors

EasyAnimateControlPipeline

Model card Files Files and versions Community

bubbliiiing commited on 2 days ago

Commit

1f9a2e5

1 Parent(s): 048d3c9

Update Readme

Browse files

Files changed (2) hide show

README.md +17 -39
README_en.md +19 -13

README.md CHANGED Viewed

@@ -1,33 +1,5 @@
 ---
-frameworks:
-- Pytorch
-license: other
-tasks:
-- text-to-video-synthesis
-#model-type:
-##如 gpt、phi、llama、chatglm、baichuan 等
-#- gpt
-#domain:
-##如 nlp、cv、audio、multi-modal
-#- nlp
-#language:
-##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
-#- cn
-#metrics:
-##如 CIDEr、Blue、ROUGE 等
-#- CIDEr
-#tags:
-##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
-#- pretrained
-#tools:
-##如 vllm、fastchat、llamacpp、AdaSeq 等
-#- vllm
 ---
 [![Arxiv Page](https://img.shields.io/badge/Arxiv-Page-red)](https://arxiv.org/abs/2405.18991)
@@ -44,6 +16,14 @@ EasyAnimate是一个基于transformer结构的pipeline，可用于生成AI图片
 # 模型地址
 EasyAnimateV5.1:
 12B:
 | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
@@ -344,23 +324,21 @@ Linux 的详细信息：
 我们需要大约 60GB 的可用磁盘空间，请检查！
 EasyAnimateV5.1-12B的视频大小可以由不同的GPU Memory生成，包括：
-| GPU memory |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
 |----------|----------|----------|----------|----------|----------|----------|
-| 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
-| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
-| 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
 | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 EasyAnimateV5.1-7B的视频大小可以由不同的GPU Memory生成，包括：
-| GPU memory |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
 |----------|----------|----------|----------|----------|----------|----------|
-| 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
-| 24GB | ✅ | ✅ | 🧡 | 🧡 | ❌ | ❌ |
-| 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
 | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-由于qwen2-vl-7b的float16的权重，无法在16GB显存下运行，如果您的显存是16GB，请前往[Huggingface](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)或者[Modelscope](https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)下载量化后的qwen2-vl-7b对原有的text encoder进行替换，并安装对应的依赖库（auto-gptq, optimum）。
 ✅ 表示它可以在"model_cpu_offload"的情况下运行，🧡代表它可以在"model_cpu_offload_and_qfloat8"的情况下运行，⭕️ 表示它可以在"sequential_cpu_offload"的情况下运行，❌ 表示它无法运行。请注意，使用sequential_cpu_offload运行会更慢。
 有一些不支持torch.bfloat16的卡型���如2080ti、V100，需要将app.py、predict文件中的weight_dtype修改为torch.float16才可以运行。

 ---
+license: apache-2.0
 ---
 [![Arxiv Page](https://img.shields.io/badge/Arxiv-Page-red)](https://arxiv.org/abs/2405.18991)
 # 模型地址
 EasyAnimateV5.1:
+7B:
+| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
+|--|--|--|--|--|--|
+| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP)| 官方的图生视频权重。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持多语言预测 |
+| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control)| 官方的视频控制权重，支持不同的控制条件，如Canny、Depth、Pose、MLSD等，同时支持使用轨迹控制。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持多语言预测 |
+| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera)| 官方的视频相机控制权重，支持通过输入相机运动轨迹控制生成方向。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持多语言预测 |
+| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh)| 官方的文生视频权重。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持多语言预测 |
 12B:
 | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
 我们需要大约 60GB 的可用磁盘空间，请检查！
 EasyAnimateV5.1-12B的视频大小可以由不同的GPU Memory生成，包括：
+| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
 |----------|----------|----------|----------|----------|----------|----------|
+| 16GB | 🧡 | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
+| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | 🧡 | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 EasyAnimateV5.1-7B的视频大小可以由不同的GPU Memory生成，包括：
+| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
 |----------|----------|----------|----------|----------|----------|----------|
+| 16GB | 🧡 | 🧡 | ⭕️ | ⭕️ | ❌ | ❌ |
+| 24GB | ✅ | ✅ | ✅ | 🧡 | 🧡 | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 ✅ 表示它可以在"model_cpu_offload"的情况下运行，🧡代表它可以在"model_cpu_offload_and_qfloat8"的情况下运行，⭕️ 表示它可以在"sequential_cpu_offload"的情况下运行，❌ 表示它无法运行。请注意，使用sequential_cpu_offload运行会更慢。
 有一些不支持torch.bfloat16的卡型���如2080ti、V100，需要将app.py、predict文件中的weight_dtype修改为torch.float16才可以运行。

README_en.md CHANGED Viewed

@@ -15,6 +15,14 @@ EasyAnimate is a pipeline based on the transformer architecture, designed for ge
 EasyAnimateV5.1:
 12B:
 | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
@@ -317,22 +325,20 @@ The detailed of Linux:
 We need about 60GB available on disk (for saving weights), please check!
 The video size for EasyAnimateV5.1-12B can be generated by different GPU Memory, including:
-| GPU memory | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
 |------------|------------|------------|------------|------------|------------|------------|
-| 16GB       | 🧡         | 🧡         | ❌         | ❌         | ❌         | ❌         |
-| 24GB       | 🧡         | 🧡         | 🧡         | 🧡         | ❌         | ❌         |
-| 40GB       | ✅         | ✅         | ✅         | ✅         | ❌         | ❌         |
-| 80GB       | ✅         | ✅         | ✅         | ✅         | ✅         | ✅         |
 The video size for EasyAnimateV5.1-7B can be generated by different GPU Memory, including:
-| GPU memory | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
-|------------|------------|------------|------------|------------|------------|------------|
-| 16GB       | 🧡         | 🧡         | ❌         | ❌         | ❌         | ❌         |
-| 24GB       | ✅         | ✅         | 🧡         | 🧡         | ❌         | ❌         |
-| 40GB       | ✅         | ✅         | ✅         | ✅         | ❌         | ❌         |
-| 80GB       | ✅         | ✅         | ✅         | ✅         | ✅         | ✅         |
-Due to the float16 weights of qwen2-vl-7b, it cannot run on a 16GB GPU. If your GPU memory is 16GB, please visit [Huggingface](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8) or [Modelscope](https://modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8) to download the quantized version of qwen2-vl-7b to replace the original text encoder, and install the corresponding dependency libraries (auto-gptq, optimum).
 ✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.

 EasyAnimateV5.1:
+7B:
+| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
+|--|--|--|--|--|--|
+| EasyAnimateV5.1-7b-zh-InP | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
+| EasyAnimateV5.1-7b-zh-Control | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control) | Official video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
+| EasyAnimateV5.1-7b-zh-Control-Camera | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh-Control-Camera) | Official video camera control weights, supporting direction generation control by inputting camera motion trajectories. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
+| EasyAnimateV5.1-7b-zh | EasyAnimateV5.1 | 30 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-7b-zh) | Official text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
 12B:
 | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
 We need about 60GB available on disk (for saving weights), please check!
 The video size for EasyAnimateV5.1-12B can be generated by different GPU Memory, including:
+| GPU memory | 384x672x25 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
 |------------|------------|------------|------------|------------|------------|------------|
+| 16GB | 🧡 | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
+| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | 🧡 | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 The video size for EasyAnimateV5.1-7B can be generated by different GPU Memory, including:
+| GPU memory |384x672x25|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
+|----------|----------|----------|----------|----------|----------|----------|
+| 16GB | 🧡 | 🧡 | ⭕️ | ⭕️ | ❌ | ❌ |
+| 24GB | ✅ | ✅ | ✅ | 🧡 | 🧡 | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 ✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.