QuantFactory/MN-12B-Lyra-v4-GGUF

This is quantized version of Sao10K/MN-12B-Lyra-v4 created using llama.cpp

Original Model Card

Lyra

Mistral-NeMo-12B-Lyra-v4, a variation of Lyra-v4a1, layered over Lyra-v3, which was built on top of Lyra-v2a2, which itself was built upon Lyra-v2a1.

Model Versioning

[See Previous Models]
  |
Lyra-v4a1
  |
  ------------> Lyra-v4 [Seperate RL Step targeting Instruct and Coherency over Base Nemo instead of SFT First, Result is Merged with Lyra-v4a1, fixes most quant-based issues. Somehow.]

This uses ChatML, or any of its variants which were included in previous versions.


<|im_start|>system
This is the system prompt.<|im_end|>
<|im_start|>user
Instructions placed here.<|im_end|>
<|im_start|>assistant
The model's response will be here.<|im_end|>
--------------------------------------------------
[INST]system
This is another system prompt.[/INST]
[INST]user
Your instructions placed here.[/INST]
[INST]assistant
The model's response will be here.[/INST]

Recommended Samplers:

Temperature: 0.6 - 1 # Make sure min_p is set before Temperature in Sampler Orders
min_p: 0.1 - 0.2 # Crucial for NeMo

Recommended Stopping Strings:

<|im_end|>
</s>
[/INST]

Notes

- I think I fixed the extra token stuff some users seem to be facing, while retaining everything else? It's some error alright.
- If you're using XML tags, you may see weird malformed stopping strings. Just add them to your current list. and move on.
- Its pretty nice, imo. I've been messing around with it a lot.
- Make sure the ChatML template is correct, I think there's some issues with the one used in SillyTavern which might cause improper replies?

Downloads last month
159
GGUF
Model size
12.2B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support