textgeflecht
/

Devstral-Small-2505-FP8-llmcompressor

Text Generation

compressed-tensors

Model card Files Files and versions

textgeflecht commited on May 25

Commit

c8de5d8

·

verified ·

1 Parent(s): 5a1abd7

Update README.md

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -210,8 +210,6 @@ print("\nGenerated Output:\n", decoded_output)
 ### Using vLLM (for optimized FP8 inference)
 > [!NOTE]
-> **Disclaimer for vLLM Usage:**
->
 > The following vLLM inference example is provided as a general guideline based on `llm-compressor`'s intended compatibility with vLLM for FP8 models. However, at the time of writing, this specific quantized model (`textgeflecht/Devstral-Small-2505-FP8-llmcompressor`) with its `MistralTokenizer` (using `tekken.json`) **has not been explicitly tested with vLLM by the author.**
 >
 > Successfully running this model with vLLM might require:

 ### Using vLLM (for optimized FP8 inference)
 > [!NOTE]
 > The following vLLM inference example is provided as a general guideline based on `llm-compressor`'s intended compatibility with vLLM for FP8 models. However, at the time of writing, this specific quantized model (`textgeflecht/Devstral-Small-2505-FP8-llmcompressor`) with its `MistralTokenizer` (using `tekken.json`) **has not been explicitly tested with vLLM by the author.**
 >
 > Successfully running this model with vLLM might require: