Model Card for Model ID

This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling. MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset. Only the last two transformer blocks and the regression head were unfroozen.

The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out.

Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data.

Scores on the first 65 samples of the gsm8k test-split:

  • Beam-search (3 beams): 40.0%
  • MCTS-search (3 beams): 50.77%

The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.

All tests were done with Qwen2.5 0.5B Instruct.

Downloads last month
13
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for micaebe/Qwen2.5-0.5B-MCTS-Value-Net

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(97)
this model