DisCO-7B-Lratio
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B on the agentica-org/DeepScaleR-Preview-Dataset.
It was fine-tuned as part of the paper DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization (paper link). Specifically, this model was fine-tuned by DisCO framework with Likelihood ratio (L-ratio) score function.
The code is available at: https://github.com/Optimization-AI/DisCO
Below are comparisons with baseline models and baseline methods for fine-tuning 7B models. MRL denotes Max Response Length utilized in training/testing. The bottom 7 methods are all for fine-tuning DeepSeek-R1-Distill-Qwen-7B model on the same DeepScaleR dataset. DS is short for DeepSeek-R1.
Model | MRL(Train/Test) | AIME 2024 | AIME 2025 | MATH 500 | AMC 2023 | Minerva | O-Bench | Avg. |
---|---|---|---|---|---|---|---|---|
DS-Distill-Qwen-7B | 32k+ / 32k | 0.560 | 0.396 | 0.923 | 0.825 | 0.380 | 0.568 | 0.609 |
DS-Distill-Qwen-7B | 32k+ / 8k | 0.402 | 0.292 | 0.873 | 0.688 | 0.355 | 0.471 | 0.513 |
GRPO-LEAD-7B | 8k / 8k | 0.470 | 0.345 | 0.893 | 0.748 | 0.372 | 0.500 | 0.555 |
TRPA | 8k / 8k | 0.570 | - | 0.870 | 0.780 | 0.360 | 0.550 | - |
GRPO | 8k / 8k | 0.498 | 0.394 | 0.916 | 0.807 | 0.381 | 0.555 | 0.592 |
GRPO+ER | 8k / 8k | 0.515 | 0.381 | 0.916 | 0.825 | 0.376 | 0.544 | 0.593 |
Dr. GRPO | 8k / 8k | 0.488 | 0.346 | 0.910 | 0.792 | 0.368 | 0.546 | 0.575 |
DAPO | 8k / 8k | 0.454 | 0.335 | 0.907 | 0.799 | 0.388 | 0.535 | 0.570 |
TRPA | 8k / 8k | 0.510 | 0.367 | 0.898 | 0.779 | 0.379 | 0.534 | 0.578 |
DisCO (L-ratio) | 8k / 8k | 0.583 | 0.421 | 0.923 | 0.852 | 0.399 | 0.585 | 0.627 |
DisCO (log-L) | 8k / 8k | 0.558 | 0.410 | 0.927 | 0.854 | 0.410 | 0.592 | 0.625 |
Citation
@article{li2025disco,
title={DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization},
author={Li, Gang and Lin, Ming and Galanti, Tomer and Tu, Zhengzhong and Yang, Tianbao},
journal={arXiv preprint arXiv:2505.12366},
year={2025}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support