---
license: apache-2.0
datasets:
- THU-KEG/Crab-VerIF
language:
- en
- zh
base_model:
- allenai/Llama-3.1-Tulu-3-8B-SFT
pipeline_tag: text2text-generation
---
# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** Hao Peng@THUGKEG
- **Model type:** RL trained LLMs
- **Language(s) (NLP):** English, Chinese
- **License:** apache-2.0
- **Finetuned from model [optional]:** allenai/Llama-3.1-Tulu-3-8B-SFT

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/THU-KEG/VerIF
- **Paper:** https://arxiv.org/abs/2506.09942

## Training Details

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

The model is trained using RL with VerIF, using train data [VerInstruct](https://huggingface.co/datasets/THU-KEG/VerInstruct).


<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

VerIF is a practical and efficient method for verification in instruction-following reinforcement learning. Built on the idea of Reinforcement Learning with Verifiable Rewards (RLVR), VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals.

The model is optimized for instruction-following, without affecting other general capabilities.


## Evaluation Results
We evaluate the model on several representative instruction-following benchmarks, including IFEval, Multi-IF, SysBench, FollowBench, and etc..
![Results](./results.png)


You can find more details in our github repo (https://github.com/THU-KEG/VerIF).
If you find this model helpful, please kindly cite us:
```
@misc{peng2025verif,
      title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following}, 
      author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
      year={2025},
      eprint={2506.09942},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09942}, 
}
```