This is a GRPO trained version of my Control nanuq model to fuck around with GRPO training. This model is highly experimental - It's *supposed to do reasoning in XML tags however it doesn't do it for some reason, Possibly i need to train for more epochs

Trained on 1xA100 80gb provided by Lucyknada, Trained with Unsloth, If your trying to replicate the model, One - Don't. Two - Swap out the default L3.1 8B colab with control nanuq

Downloads last month: 2

Safetensors

Model size

8.03B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NewEden/Control-Nanuq-GRPO

Base model

Delta-Vector/Control-Nanuq-8B

Finetuned

(1)

this model

Quantizations

1 model