FP8 Dynamic Version for vLLM
#5
by
brandonbeiler
- opened
https://huggingface.co/brandonbeiler/Skywork-R1V3-38B-FP8-Dynamic
Uploaded an FP8 version of this model, made with llm-compressor for vLLM inference. Currently, seems to struggle a bit with using the --enable-reasoning flag with vLLM but inference does perform fast/accurate.