Nitral-AI commited on
Commit
135b96d
·
verified ·
1 Parent(s): 8f687ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -6,7 +6,7 @@ language:
6
 
7
  # MMR-12B (Mag-Mell-Reasoner-12B)
8
 
9
- MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA with supervised fine-tuning. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.
10
 
11
  This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.
12
 
 
6
 
7
  # MMR-12B (Mag-Mell-Reasoner-12B)
8
 
9
+ MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.
10
 
11
  This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.
12