Spaces:
Runtime error
Runtime error
Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Download Pretrained Models
|
| 2 |
+
|
| 3 |
+
All models are stored in `HunyuanVideo/ckpts` by default, and the file structure is as follows
|
| 4 |
+
```shell
|
| 5 |
+
HunyuanVideo
|
| 6 |
+
├──ckpts
|
| 7 |
+
│ ├──README.md
|
| 8 |
+
│ ├──hunyuan-video-t2v-720p
|
| 9 |
+
│ │ ├──transformers
|
| 10 |
+
│ │ │ ├──mp_rank_00_model_states.pt
|
| 11 |
+
│ │ │ ├──mp_rank_00_model_states_fp8.pt
|
| 12 |
+
│ │ │ ├──mp_rank_00_model_states_fp8_map.pt
|
| 13 |
+
├ │ ├──vae
|
| 14 |
+
│ ├──text_encoder
|
| 15 |
+
│ ├──text_encoder_2
|
| 16 |
+
├──...
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
## Download HunyuanVideo model
|
| 20 |
+
To download the HunyuanVideo model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
|
| 21 |
+
|
| 22 |
+
```shell
|
| 23 |
+
python -m pip install "huggingface_hub[cli]"
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
Then download the model using the following commands:
|
| 27 |
+
|
| 28 |
+
```shell
|
| 29 |
+
# Switch to the directory named 'HunyuanVideo'
|
| 30 |
+
cd HunyuanVideo
|
| 31 |
+
# Use the huggingface-cli tool to download HunyuanVideo model in HunyuanVideo/ckpts dir.
|
| 32 |
+
# The download time may vary from 10 minutes to 1 hour depending on network conditions.
|
| 33 |
+
huggingface-cli download tencent/HunyuanVideo --local-dir ./ckpts
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
<details>
|
| 37 |
+
<summary>💡Tips for using huggingface-cli (network problem)</summary>
|
| 38 |
+
|
| 39 |
+
##### 1. Using HF-Mirror
|
| 40 |
+
|
| 41 |
+
If you encounter slow download speeds in China, you can try a mirror to speed up the download process. For example,
|
| 42 |
+
|
| 43 |
+
```shell
|
| 44 |
+
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download tencent/HunyuanVideo --local-dir ./ckpts
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
##### 2. Resume Download
|
| 48 |
+
|
| 49 |
+
`huggingface-cli` supports resuming downloads. If the download is interrupted, you can just rerun the download
|
| 50 |
+
command to resume the download process.
|
| 51 |
+
|
| 52 |
+
Note: If an `No such file or directory: 'ckpts/.huggingface/.gitignore.lock'` like error occurs during the download
|
| 53 |
+
process, you can ignore the error and rerun the download command.
|
| 54 |
+
|
| 55 |
+
</details>
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## Download Text Encoder
|
| 60 |
+
|
| 61 |
+
HunyuanVideo uses an MLLM model and a CLIP model as text encoder.
|
| 62 |
+
|
| 63 |
+
1. MLLM model (text_encoder folder)
|
| 64 |
+
|
| 65 |
+
HunyuanVideo supports different MLLMs (including HunyuanMLLM and open-source MLLM models). At this stage, we have not yet released HunyuanMLLM. We recommend the user in community to use [llava-llama-3-8b](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) provided by [Xtuer](https://huggingface.co/xtuner), which can be downloaded by the following command
|
| 66 |
+
|
| 67 |
+
```shell
|
| 68 |
+
cd HunyuanVideo/ckpts
|
| 69 |
+
huggingface-cli download xtuner/llava-llama-3-8b-v1_1-transformers --local-dir ./llava-llama-3-8b-v1_1-transformers
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
In order to save GPU memory resources for model loading, we separate the language model parts of `llava-llama-3-8b-v1_1-transformers` into `text_encoder`.
|
| 73 |
+
```
|
| 74 |
+
cd HunyuanVideo
|
| 75 |
+
python hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py --input_dir ckpts/llava-llama-3-8b-v1_1-transformers --output_dir ckpts/text_encoder
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
2. CLIP model (text_encoder_2 folder)
|
| 79 |
+
|
| 80 |
+
We use [CLIP](https://huggingface.co/openai/clip-vit-large-patch14) provided by [OpenAI](https://openai.com) as another text encoder, users in the community can download this model by the following command
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
cd HunyuanVideo/ckpts
|
| 84 |
+
huggingface-cli download openai/clip-vit-large-patch14 --local-dir ./text_encoder_2
|
| 85 |
+
```
|