File size: 1,193 Bytes
c8dd5f6 caadd01 7302d77 a79dd4c 135b96d a79dd4c 7302d77 a79dd4c 7302d77 4b62704 7302d77 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: other
language:
- en
---
# MMR-12B (Mag-Mell-Reasoner-12B)
MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.
This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.
## Dataset Composition
- 5,000 non-reasoning toxicity data entries
- 5,000 non-reasoning Reddit NSFW entries
- 2,200 non-reasoning entries from A.R.E.S (Army - Resistance - Energetics - Survival)
- 5,000 reasoning instruction data entries
- 2,225 RP reasoning data entries
## Training Hyperparameters
- Epochs: 2
- Learning Rate: 2e-4
- Gradient Accumulation: 4
- Batch Size: 8
- LoRA Rank: 64
- LoRA Alpha: 64
|