FP8 Dynamic Version for vLLM

by brandonbeiler - opened 22 days ago

22 days ago

https://huggingface.co/brandonbeiler/Skywork-R1V3-38B-FP8-Dynamic

Uploaded an FP8 version of this model, made with llm-compressor for vLLM inference. Currently, seems to struggle a bit with using the --enable-reasoning flag with vLLM but inference does perform fast/accurate.

fakerbaby

Skywork org 20 days ago

Thanks for your attention on Skywork-R1V3! This issue occurred because we didn't include and in token_config.json. You can refer to this link for a similar discussion.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment