QVGen: Pushing the Limit of Quantized Video Generative Models
Abstract
QVGen is a quantization-aware training framework that enhances the performance and efficiency of video diffusion models under low-bit quantization, achieving high-quality video synthesis comparable to full-precision models.
Video diffusion models (DMs) have enabled high-quality video synthesis. Yet, their substantial computational and memory demands pose serious challenges to real-world deployment, even on high-end GPUs. As a commonly adopted solution, quantization has proven notable success in reducing cost for image DMs, while its direct application to video DMs remains ineffective. In this paper, we present QVGen, a novel quantization-aware training (QAT) framework tailored for high-performance and inference-efficient video DMs under extremely low-bit quantization (e.g., 4-bit or below). We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT. To this end, we introduce auxiliary modules (Phi) to mitigate large quantization errors, leading to significantly enhanced convergence. To eliminate the inference overhead of Phi, we propose a rank-decay strategy that progressively eliminates Phi. Specifically, we repeatedly employ singular value decomposition (SVD) and a proposed rank-based regularization gamma to identify and decay low-contributing components. This strategy retains performance while zeroing out inference overhead. Extensive experiments across 4 state-of-the-art (SOTA) video DMs, with parameter sizes ranging from 1.3B sim14B, show that QVGen is the first to reach full-precision comparable quality under 4-bit settings. Moreover, it significantly outperforms existing methods. For instance, our 3-bit CogVideoX-2B achieves improvements of +25.28 in Dynamic Degree and +8.43 in Scene Consistency on VBench.
Community
Video diffusion models (DMs) have enabled high-quality video synthesis. Yet, their substantial computational and memory demands pose serious challenges to real-world deployment, even on high-end GPUs. As a commonly adopted solution, quantization has proven notable success in reducing cost for image DMs, while its direct application to video DMs remains ineffective. In this paper, we present QVGen, a novel quantization-aware training (QAT) framework tailored for high-performance and inference-efficient video DMs under extremely low-bit quantization (e.g., 4-bit or below). We begin with a theoretical analysis demonstrating that reducing the gradient norm is essential to facilitate convergence for QAT. To this end, we introduce auxiliary modules (Φ) to mitigate large quantization errors, leading to significantly enhanced convergence. To eliminate the inference overhead of Φ, we propose a rank-decay strategy that progressively eliminates Φ. Specifically, we repeatedly employ singular value decomposition (SVD) and a proposed rank-based regularization γ to identify and decay low-contributing components. This strategy retains performance while zeroing out inference overhead. Extensive experiments across 4 state-of-the-art (SOTA) video DMs, with parameter sizes ranging from 1.3B ∼14B, show that QVGen is the first to reach full-precision comparable quality under 4-bit settings. Moreover, it significantly outperforms existing methods. For instance, our 3-bit CogVideoX-2B achieves improvements of +25.28 in Dynamic Degree and +8.43 in Scene Consistency on VBench.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement (2025)
- Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration (2025)
- GranQ: Granular Zero-Shot Quantization with Unified Layer-Channel Awareness (2025)
- Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis (2025)
- Training-free Diffusion Acceleration with Bottleneck Sampling (2025)
- Latent Video Dataset Distillation (2025)
- Precision Neural Network Quantization via Learnable Adaptive Modules (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper