--- license: apache-2.0 datasets: - THU-KEG/Crab-VerIF language: - en - zh base_model: - allenai/Llama-3.1-Tulu-3-8B-SFT pipeline_tag: text2text-generation --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by:** Hao Peng@THUGKEG - **Model type:** RL trained LLMs - **Language(s) (NLP):** English, Chinese - **License:** apache-2.0 - **Finetuned from model [optional]:** allenai/Llama-3.1-Tulu-3-8B-SFT ### Model Sources [optional] - **Repository:** https://github.com/THU-KEG/VerIF - **Paper:** https://arxiv.org/abs/2506.09942 ## Training Details The model is trained using RL with VerIF, using train data [VerInstruct](https://huggingface.co/datasets/THU-KEG/VerInstruct). VerIF is a practical and efficient method for verification in instruction-following reinforcement learning. Built on the idea of Reinforcement Learning with Verifiable Rewards (RLVR), VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals. The model is optimized for instruction-following, without affecting other general capabilities. ## Evaluation Results We evaluate the model on several representative instruction-following benchmarks, including IFEval, Multi-IF, SysBench, FollowBench, and etc.. ![Results](./results.png) You can find more details in our github repo (https://github.com/THU-KEG/VerIF). If you find this model helpful, please kindly cite us: ``` @misc{peng2025verif, title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following}, author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li}, year={2025}, eprint={2506.09942}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.09942}, } ```