File size: 1,193 Bytes
c8dd5f6
 
 
 
 
caadd01
7302d77
a79dd4c
135b96d
a79dd4c
7302d77
a79dd4c
7302d77
 
 
 
 
 
4b62704
7302d77
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
license: other
language:
- en
---

# MMR-12B (Mag-Mell-Reasoner-12B)

MMR-12B is based on [MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) in ChatML format and was trained using 4-bit QLoRA. Reasoning is structured via `<think>...</think>` tags. Unlike the Violet Magcap series, this model has **not** been merged into any GRPO (RLHF) model that uses the older `<reasoning>...</reasoning> <answer>...</answer>` formatting, nor has it undergone GRPO-style reinforcement training directly.

This run was redone after identifying RP formatting inconsistencies in later Magcap variants—specifically versions 1.5 and 1.9. Further updates in either series are unlikely for now, as I’m currently on hiatus to focus on other matters.

## Dataset Composition
- 5,000 non-reasoning toxicity data entries  
- 5,000 non-reasoning Reddit NSFW entries  
- 2,200 non-reasoning entries from A.R.E.S (Army - Resistance - Energetics - Survival)  
- 5,000 reasoning instruction data entries  
- 2,225 RP reasoning data entries  

## Training Hyperparameters
- Epochs: 2  
- Learning Rate: 2e-4  
- Gradient Accumulation: 4  
- Batch Size: 8  
- LoRA Rank: 64  
- LoRA Alpha: 64