YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

This repository is the official PyTorch implementation of AccVideo. AccVideo is a novel efficient distillation method to accelerate video diffusion models with synthetic datset. Our method is 8.5x faster than HunyuanVideo.

arXiv Project Page Hugging Face Spaces

πŸ”₯πŸ”₯πŸ”₯ News

  • Jun 3, 2025: We release the inference code and model weights of AccVideo based on WanXI2V-480P-14B.
  • May 26, 2025: We release the inference code and model weights of AccVideo based on WanXT2V-14B.
  • Mar 31, 2025: ComfyUI-Kijai (FP8 Inference): ComfyUI-Integration by Kijai
  • Mar 26, 2025: We release the inference code and model weights of AccVideo based on HunyuanT2V.

πŸŽ₯ Demo (Based on HunyuanT2V)

https://github.com/user-attachments/assets/59f3c5db-d585-4773-8d92-366c1eb040f0

πŸŽ₯ Demo (Based on WanXT2V-14B)

https://github.com/user-attachments/assets/ff9724da-b76c-478d-a9bf-0ee7240494b2

πŸŽ₯ Demo (Based on WanXI2V-480P-14B)

πŸ“‘ Open-source Plan

  • Inference
  • Checkpoints
  • Multi-GPU Inference
  • Synthetic Video Dataset, SynVid
  • Training

πŸ”§ Installation

The code is tested on Python 3.10.0, CUDA 11.8 and A100.

conda create -n accvideo python==3.10.0
conda activate accvideo

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
pip install "huggingface_hub[cli]"

πŸ€— Checkpoints

To download the checkpoints (based on HunyuanT2V), use the following command:

# Download the model weight
huggingface-cli download aejion/AccVideo --local-dir ./ckpts

To download the checkpoints (based on WanX-T2V-14B), use the following command:

# Download the model weight
huggingface-cli download aejion/AccVideo-WanX-T2V-14B --local-dir ./wanx_t2v_ckpts

To download the checkpoints (based on WanX-I2V-480P-14B), use the following command:

# Download the model weight
huggingface-cli download aejion/AccVideo-WanX-I2V-480P-14B --local-dir ./wanx_i2v_ckpts

πŸš€ Inference

We recommend using a GPU with 80GB of memory. We use AccVideo to distill Hunyuan and WanX.

Inference for HunyuanT2V

To run the inference, use the following command:

export MODEL_BASE=./ckpts
python sample_t2v.py \
    --height 544 \
    --width 960 \
    --num_frames 93 \
    --num_inference_steps 5 \
    --guidance_scale 1 \
    --embedded_cfg_scale 6 \
    --flow_shift 7 \
    --flow-reverse \
    --prompt_file ./assets/prompt.txt \
    --seed 1024 \
    --output_path ./results/accvideo-544p \
    --model_path ./ckpts \
    --dit-weight ./ckpts/accvideo-t2v-5-steps/diffusion_pytorch_model.pt

The following table shows the comparisons on inference time using a single A100 GPU:

Model Setting(height/width/frame) Inference Time(s)
HunyuanVideo 720px1280px129f 3234
Ours 720px1280px129f 380(8.5x faster)
HunyuanVideo 544px960px93f 704
Ours 544px960px93f 91(7.7x faster)

Inference for WanXT2V

To run the inference, use the following command:

python sample_wanx_t2v.py \
       --task t2v-14B \
       --size 832*480 \
       --ckpt_dir ./wanx_t2v_ckpts \
       --sample_solver 'unipc' \
       --save_dir ./results/accvideo_wanx_14B \
       --sample_steps 10

The following table shows the comparisons on inference time using a single A100 GPU:

Model Setting(height/width/frame) Inference Time(s)
WanX 480px832px81f 932
Ours 480px832px81f 97(9.6x faster)

Inference for WanXI2V-480P

To run the inference, use the following command:

python sample_wanx_i2v.py \
       --task i2v-14B \
       --size 832*480 \
       --ckpt_dir ./wanx_i2v_ckpts \
       --sample_solver 'unipc' \
       --save_dir ./results/accvideo_wanx_i2v_14B \
       --sample_steps 10

The following table shows the comparisons on inference time using a single A100 GPU:

Model Setting(height/width/frame) Inference Time(s)
WanX-I2V 480px832px81f 768
Ours 480px832px81f 112(6.8x faster)

πŸ† VBench Results

We report VBench evaluation results for our distilled models. We utilized the respective augmented prompts provided by the VBench team to generate videos. (HunyuanVideo augmented prompts for AccVideo-HunyuanT2V and WanX augmented prompts for AccVideo-WanXT2V)

Model Setting(height/width/frame) Total Score Quality Score Semantic Score Subject Consistency Background Consistency Temporal Flickering Motion Smoothness Dynamic Degree Aesthetic Quality Image Quality Object Class Multiple Objects Human Action Color Spatial Relationship Scene Appearance Style Temporal Style Overall Consistency
AccVideo-HunyuanT2V 544px960px93f 83.26% 84.58% 77.96% 94.46% 97.45% 99.18% 98.79% 75.00% 62.08% 65.64% 92.99% 67.33% 95.60% 94.11% 75.70% 54.72% 19.87% 23.71% 27.21%
AccVideo-WanXT2V 480px832px81f 85.95% 86.62% 83.25% 95.02% 97.75% 99.54% 97.95% 93.33% 64.21% 68.42% 98.38% 86.58% 97.40% 92.04% 75.68% 59.82% 23.88% 24.62% 27.34%

πŸ”— BibTeX

If you find AccVideo useful for your research and applications, please cite using this BibTeX:

@article{zhang2025accvideo,
    title={AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset},
    author={Zhang, Haiyu and Chen, Xinyuan and Wang, Yaohui and Liu, Xihui and Wang, Yunhong and Qiao, Yu},
    journal={arXiv preprint arXiv:2503.19462},
    year={2025}
}

Acknowledgements

The code is built upon FastVideo and HunyuanVideo, we thank all the contributors for open-sourcing.

Downloads last month
222
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using aejion/AccVideo-WanX-I2V-480P-14B 1