INFLogic-Qwen2.5-32B-RL-Preview

Model Overview

  • INFLogic-Qwen2.5-32B-RL-Preview enhances the reasoning capabilities of DeepSeek-R1-Distill-Qwen-32B through fine-tuning on our proprietary logical reasoning dataset using reinforcement learning with verifiable rewards (RLVR).
  • As of March 27th, 2025, this model achieves state-of-the-art performance among open-source LLMs on ZebraLogicBench, demonstrating enhanced logical reasoning abilities.

Evaluation Results

Model MATH-500 ZebraLogic GPQA
INFLogic-Qwen2.5-32B-RL-Preview 95.6 84.1 65.7
DeepSeek-R1-Distill-Qwen-32B 94.3 68.7 62.1
DeepSeek-R1 96.2 77.2 78.9
OpenAI o1 96.4 87.9 85.2

We report pass@1 scores using vLLM 0.5.3 (temperature=0.6, top_p=0.95). For MATH-500 and GPQA, we used Open R1's evaluation scripts. Other models' results come from their original reports.

Contributors

Supervisors

Wei Chu • Yuan Qi

Logic Team

Cheng Peng • Shuyao Xu • Weidi Xu

Acknowledgments

We thank Chao Qu, Haozhe Wang, Jiaran Hao, and Liuyihan Song for their valuable discussions and support.

Citation

If you find our model useful, please consider citing:

@misc{INFLogic_RL_Preview,
  author       = {Peng, Cheng and Xu, Shuyao and Xu, Weidi and Chu, Wei and Qi, Yuan},
  title        = {INFLogic-Qwen2.5-32B-RL-Preview},
  year         = {2025},
  month        = {March},
  howpublished = {Hugging Face},
  url          = {https://huggingface.co/infly/INFLogic-Qwen2.5-32B-RL-Preview},
}
Downloads last month
7
Safetensors
Model size
32.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for infly/INFLogic-Qwen2.5-32B-RL-Preview

Finetuned
(51)
this model
Quantizations
1 model