How to quantize the hunyuan model to fp8

by hz094 - opened Dec 25, 2024

Discussion

hz094

Dec 25, 2024

Hi sir, Thank for the excellent work, I am curious about how you quantize the hunyuan model, may you show more details?

calcuis

Owner Dec 25, 2024

•

edited 28 days ago

you need torch and llama.cpp; could try to convert the safetensors to gguf and test it first; simply execute: ggc t

actually, if you just want fp8, the updated node has a tool - tensor cutter, which will help you make your own fp8 scaled model (50% decreased in file size) in an easy way; you don't need llama.cpp or any extra dependency in that case

calcuis changed discussion status to closed 28 days ago

calcuis changed discussion status to open 28 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment