INFLogic-Qwen2.5-32B-RL-Preview
Model Overview
- INFLogic-Qwen2.5-32B-RL-Preview enhances the reasoning capabilities of DeepSeek-R1-Distill-Qwen-32B through fine-tuning on our proprietary logical reasoning dataset using reinforcement learning with verifiable rewards (RLVR).
- As of March 27th, 2025, this model achieves state-of-the-art performance among open-source LLMs on ZebraLogicBench, demonstrating enhanced logical reasoning abilities.
Evaluation Results
Model | MATH-500 | ZebraLogic | GPQA |
---|---|---|---|
INFLogic-Qwen2.5-32B-RL-Preview | 95.6 | 84.1 | 65.7 |
DeepSeek-R1-Distill-Qwen-32B | 94.3 | 68.7 | 62.1 |
DeepSeek-R1 | 96.2 | 77.2 | 78.9 |
OpenAI o1 | 96.4 | 87.9 | 85.2 |
We report pass@1 scores using vLLM 0.5.3 (temperature=0.6, top_p=0.95). For MATH-500 and GPQA, we used Open R1's evaluation scripts. Other models' results come from their original reports.
Contributors
Supervisors
Wei Chu • Yuan Qi
Logic Team
Cheng Peng • Shuyao Xu • Weidi Xu
Acknowledgments
We thank Chao Qu, Haozhe Wang, Jiaran Hao, and Liuyihan Song for their valuable discussions and support.
Citation
If you find our model useful, please consider citing:
@misc{INFLogic_RL_Preview,
author = {Peng, Cheng and Xu, Shuyao and Xu, Weidi and Chu, Wei and Qi, Yuan},
title = {INFLogic-Qwen2.5-32B-RL-Preview},
year = {2025},
month = {March},
howpublished = {Hugging Face},
url = {https://huggingface.co/infly/INFLogic-Qwen2.5-32B-RL-Preview},
}
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support