🧪 Experimental Model

This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!

image/png

Denker-mistral-nemo-12B

Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.

This run experiments with the Qwen-style chat template and <think>...</think>-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.

Finetuning Details

  • Method: ORPO
  • Epochs: 0.25
  • Learning Rate: 8e-6, cosine decay w/ 5% warmup
  • Batch Size: 1 x 64 (64 effective)
  • Max Grad Norm: 0.5
  • LoRA Rank: 128
  • Hardware: 1x NVIDIA RTX A6000

Dataset Composition

Thinking disabled:

Chain of Thought

30,000 samples of each dataset with thinking enabled.

Results

Observations

The model will sometimes decide not to think.

Evals

TBD

Downloads last month
16
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nbeerbower/Denker-mistral-nemo-12B

Datasets used to train nbeerbower/Denker-mistral-nemo-12B