Experimental commander model V1.

Named it Zelensky in order to troll Uncle Elon on twitter over how bad Grok-2 is.

Training process, low 1 epoch learning rate and evolutionary-merged with the 3 other models(listed on modelcard)

Process repeated multiple times on 8x AMD Mi300 192GB gpus while also running gpqa_diamond_zeroshot on LM_Eval harness.

Thank you Vultr https://www.vultr.com/register/ for sponsoring the compute.

Qwen License still applies by default.

Downloads last month
18
Safetensors
Model size
78B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nisten/zelensky-78b

Base model

Qwen/Qwen2.5-72B
Finetuned
(31)
this model
Quantizations
3 models

Datasets used to train nisten/zelensky-78b