MMR-12B (Mag-Mell-Reasoner-12B)

MMR-12B is based on MN-12B-Mag-Mell-R1 in ChatML format and was trained using 4-bit QLoRA. Reasoning is structured via <think>...</think> tags. Unlike the Violet Magcap series, this model has not been merged into any GRPO (RLHF) model that uses the older <reasoning>...</reasoning> <answer>...</answer> formatting, nor has it undergone GRPO-style reinforcement training directly.

This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.

Dataset Composition

  • 5,000 non-reasoning toxicity data entries
  • 5,000 non-reasoning Reddit NSFW entries
  • 2,200 non-reasoning entries from A.R.E.S (Army - Resistance - Energetics - Survival)
  • 5,000 reasoning instruction data entries
  • 2,225 RP reasoning data entries

Training Hyperparameters

  • Epochs: 2
  • Learning Rate: 2e-4
  • Gradient Accumulation: 4
  • Batch Size: 8
  • LoRA Rank: 64
  • LoRA Alpha: 64
Downloads last month
16
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nitral-AI/Mag-Mell-Reasoner-12B

Merges
1 model