robertgshaw2
commited on
Commit
•
1865eb1
1
Parent(s):
8a52c2d
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- int4
|
10 |
---
|
11 |
|
12 |
-
##
|
13 |
This repo contains model files for [OpenHermes-2.5-Mistral-7b](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
14 |
|
15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|
|
|
9 |
- int4
|
10 |
---
|
11 |
|
12 |
+
## openhermes-2.5-mistral-7b
|
13 |
This repo contains model files for [OpenHermes-2.5-Mistral-7b](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
14 |
|
15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|