Update README.md
Browse files
README.md
CHANGED
@@ -210,8 +210,6 @@ print("\nGenerated Output:\n", decoded_output)
|
|
210 |
|
211 |
### Using vLLM (for optimized FP8 inference)
|
212 |
> [!NOTE]
|
213 |
-
> **Disclaimer for vLLM Usage:**
|
214 |
-
>
|
215 |
> The following vLLM inference example is provided as a general guideline based on `llm-compressor`'s intended compatibility with vLLM for FP8 models. However, at the time of writing, this specific quantized model (`textgeflecht/Devstral-Small-2505-FP8-llmcompressor`) with its `MistralTokenizer` (using `tekken.json`) **has not been explicitly tested with vLLM by the author.**
|
216 |
>
|
217 |
> Successfully running this model with vLLM might require:
|
|
|
210 |
|
211 |
### Using vLLM (for optimized FP8 inference)
|
212 |
> [!NOTE]
|
|
|
|
|
213 |
> The following vLLM inference example is provided as a general guideline based on `llm-compressor`'s intended compatibility with vLLM for FP8 models. However, at the time of writing, this specific quantized model (`textgeflecht/Devstral-Small-2505-FP8-llmcompressor`) with its `MistralTokenizer` (using `tekken.json`) **has not been explicitly tested with vLLM by the author.**
|
214 |
>
|
215 |
> Successfully running this model with vLLM might require:
|