ganglii/DisCO-1.5B-Lratio

DisCO-1.5B-Lratio

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the agentica-org/DeepScaleR-Preview-Dataset.

It was fine-tuned as part of the paper DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization (paper link). Specifically, this model was fine-tuned by DisCO framework with Likelihood ratio (L-ratio) score function.

The code is available at: https://github.com/Optimization-AI/DisCO

Below are comparisons with baseline models and baseline methods for fine-tuning 1.5B models. OpenAI-o1-preview is included as a reference. MRL denotes Max Response Length utilized in training/testing. The bottom 9 methods are all for fine-tuning DeepSeek-R1-Distill-Qwen-1.5B model on the same DeepScaleR dataset. DS is short for DeepSeek-R1, DSR is short for DeepScalaR.

Model	MRL(Train/Test)	AIME 2024	AIME 2025	MATH 500	AMC 2023	Minerva	O-Bench	Avg.
OpenAI-o1-Preview	-	0.4	-	0.814	-	-	-	-
DS-Distill-Qwen-1.5B	32k+ / 32k	0.288	0.263	0.828	0.629	0.265	0.433	0.451
DS-Distill-Qwen-1.5B	32k+ / 8k	0.181	0.215	0.758	0.515	0.237	0.353	0.376
STILL-3-1.5B-preview	29k / 32k	0.325	0.248	0.844	0.667	0.290	0.454	0.471
DSR-1.5B-Preview	24k / 32k	0.431	0.304	0.878	0.736	0.302	0.500	0.525
DSR-1.5B-Preview	24k / 8k	0.358	0.258	0.860	0.679	0.297	0.473	0.488
GRPO	8k / 8k	0.277	0.242	0.838	0.647	0.276	0.462	0.457
GRPO+ER	8k / 8k	0.298	0.242	0.839	0.649	0.279	0.452	0.460
Dr. GRPO	8k / 8k	0.250	0.238	0.830	0.629	0.270	0.443	0.443
DAPO	8k / 8k	0.310	0.252	0.848	0.675	0.296	0.456	0.473
TRPA	8k / 8k	0.354	0.235	0.835	0.653	0.283	0.458	0.470
DisCO (L-ratio)	8k / 8k	0.381	0.306	0.878	0.746	0.319	0.512	0.524
DisCO (log-L)	8k / 8k	0.404	0.317	0.876	0.758	0.333	0.509	0.533

Citation

@article{li2025disco,
  title={DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization},
  author={Li, Gang and Lin, Ming and Galanti, Tomer and Tu, Zhengzhong and Yang, Tianbao},
  journal={arXiv preprint arXiv:2505.12366},
  year={2025}
}

ganglii
/

DisCO-1.5B-Lratio

DisCO-1.5B-Lratio

Citation

Model tree for ganglii/DisCO-1.5B-Lratio