RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic · Why not FP8 with static and per-tensor quantization?

Why not FP8 with static and per-tensor quantization?

by wanzhenchn - opened Apr 11

Apr 11

•

Thanks a lot. I found that the config.json and recipe.yaml shows its dynamic FP8 quantization, I have following questions:

Why not static and per-tensor?
The ignore_layers_list showed in recipe.yaml is as follows. why these 're:.*self_attn', 're:.*feed_forward.gate_proj', 're:.*feed_forward.up_proj', 're:.*feed_forward.down_proj' layers is ignored?

ignore: ['re:.*lm_head', 're:.*self_attn', 're:.*router', 're:.*vision_model', 're:.*multi_modal_projector',
        're:.*shared_expert', 're:.*feed_forward.gate_proj', 're:.*feed_forward.up_proj',
        're:.*feed_forward.down_proj']

Could you share the code of using llmcompressor tookit to get FP8-dynamic/static model?

mgoin

Red Hat AI org Apr 11

We are following the standard set by https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 for now. I do think per-channel and per-token are needed to preserve accuracy for this model. We are exploring more aggressive quantization ablations right now, but this is what we wanted to push first.

tkamins1

17 days ago

Thanks a lot. I found that the config.json and recipe.yaml shows its dynamic FP8 quantization, I have following questions:

Why not static and per-tensor?

The ignore_layers_list showed in recipe.yaml is as follows. why these 're:.*self_attn', 're:.*feed_forward.gate_proj', 're:.*feed_forward.up_proj', 're:.*feed_forward.down_proj' layers is ignored?
ignore: ['re:.*lm_head', 're:.*self_attn', 're:.*router', 're:.*vision_model', 're:.*multi_modal_projector',
        're:.*shared_expert', 're:.*feed_forward.gate_proj', 're:.*feed_forward.up_proj',
        're:.*feed_forward.down_proj']
Could you share the code of using llmcompressor tookit to get FP8-dynamic/static model?

Were you able to successfully compile this build yet, and was it nominal to say the least?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment