neuralmagic
/

OpenHermes-2.5-Mistral-7B-marlin

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

robertgshaw2 commited on Mar 6

Commit

1865eb1

•

1 Parent(s): 8a52c2d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ tags:
 - int4
 ---
-## zephyr-7b-beta-marlin
 This repo contains model files for [OpenHermes-2.5-Mistral-7b](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
 This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.

 - int4
 ---
+## openhermes-2.5-mistral-7b
 This repo contains model files for [OpenHermes-2.5-Mistral-7b](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
 This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.