HunyuanVideo-Foley FP8 Quantized

This is an FP8 quantized version of tencent/HunyuanVideo-Foley optimized for reduced VRAM usage while maintaining audio generation quality.

Quantization Details

  • Quantization Method: FP8 E5M2 & E4M3FN weight-only quantization
  • Layers Quantized: Transformer block weights only (attention and FFN layers)
  • Preserved Precision: Normalization layers, embeddings, and biases remain in original precision
  • Expected VRAM Savings: ~30-40% reduction compared to BF16 original
  • Memory Usage: Enables running on <12GB GPUs when combined with other optimizations

Usage

ComfyUI (Recommended)

This model is specifically optimized for use with the ComfyUI-HunyuanVideo-Foley custom node, which provides:

  • VRAM-friendly loading with ping-pong memory management
  • Built-in FP8 support that automatically handles the quantized weights
  • Torch compile integration for ~30% speed improvements after first run
  • Text-to-Audio and Video-to-Audio modes
  • Batch generation with audio selection tools

Installation:

  1. Install the ComfyUI node: ComfyUI-HunyuanVideo-Foley
  2. Download this quantized model to ComfyUI/models/foley/
  3. Enjoy <8GB VRAM usage with high-quality audio generation

Typical VRAM Usage (5s audio, 50 steps):

  • Baseline (BF16): ~10-12 GB
  • With FP8 quantization: ~8-10 GB
  • Perfect for RTX 3080/4070 Ti and similar GPUs

Other Frameworks

The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture.

Files

  • hunyuanvideo_foley_fp8_e4m3fn.safetensors - Main model weights in FP8 format

Performance Notes

  • Quality: Maintains comparable audio generation quality to the original model
  • Speed: Conversion overhead is minimal; actual generation speed depends on compute precision
  • Memory: Significant VRAM reduction makes the model accessible on consumer GPUs
  • Compatibility: Drop-in replacement for the original model weights

Original Model

This quantization is based on tencent/HunyuanVideo-Foley. Please refer to the original repository for:

  • Model architecture details
  • Training information
  • License terms
  • Citation information

Technical Details

The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components:

  • โœ… Converted: Attention and FFN layer weights in transformer blocks
  • โŒ Preserved: Normalization layers, embeddings, projections, bias terms

This selective quantization strategy maintains model quality while maximizing memory savings.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for phazei/HunyuanVideo-Foley

Finetuned
(1)
this model