Nitral-AI
/

Mag-Mell-Reasoner-12B

Model card Files Files and versions

Nitral-AI commited on May 4

Commit

135b96d

·

verified ·

1 Parent(s): 8f687ee

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ language:
 # MMR-12B (Mag-Mell-Reasoner-12B)
-MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA with supervised fine-tuning. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.
 This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.

 # MMR-12B (Mag-Mell-Reasoner-12B)
+MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.
 This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.