FP8 Dynamic Version for vLLM

#5
by brandonbeiler - opened

https://huggingface.co/brandonbeiler/Skywork-R1V3-38B-FP8-Dynamic

Uploaded an FP8 version of this model, made with llm-compressor for vLLM inference. Currently, seems to struggle a bit with using the --enable-reasoning flag with vLLM but inference does perform fast/accurate.

Skywork org

Thanks for your attention on Skywork-R1V3! This issue occurred because we didn't include and in token_config.json. You can refer to this link for a similar discussion.

Sign up or log in to comment