VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb
For implementation details, visit our GitHub repository: VeriReason
Check out our paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Update Log
2025.05.17: Initial release of VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb
Project Description
This is the Model for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
Installation
To install this project, follow these steps:
- Clone the repository:
git clone https://github.com/NellyW8/VeriReason.git
- Navigate to the project directory:
cd VeriReason
- Install the dependencies as specified in the repository
Usage
You can use the model with the transformers library:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Nellyw888/VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model.eval()
prompt = """
Please act as a professional verilog designer. Develop a module that implements a 8-bit comparator. The module should have two 8-bit inputs and one output. If the first input is greater than the second input, the output should be high. Otherwise, the output should be low. First, think through the design approach, considering the functionality, inputs, outputs, and implementation details. Then provide the complete Verilog code implementation. Respond in the following format: <think>
...
</think>
<answer>
```verilog
...```
</answer>
"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=1024, temperature=0.2, top_p=0.95)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
Training
The GRPO (Generative Reinforcement Learning from Preference Optimization) training is based on the OpenR1 framework. For training with GRPO:
Move the necessary files to the OpenR1 directory:
mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/
Create a directory for the Verilog recipe:
mkdir verilog_recipe mv verilog_grpo_tb.yaml verilog_recipe/
Run training:
NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=0,1,2 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false
Citation
Please cite our paper if you use our model or dataset:
@misc{wang2025verireasonreinforcementlearningtestbench,
title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation},
author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
year={2025},
eprint={2505.11849},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.11849},
}
Acknowledgement
This repo benefits from OpenR1 and LLamaFactory.
- Downloads last month
- 19
Model tree for Nellyw888/VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb
Base model
Qwen/Qwen2.5-1.5B