Introduction

We release our first reflective generative model: MetaStone-S1. With only 32B parameters, MetaStone-S1 performs comparably to the OpenAI-o3 series on mathematics, coding, and Chinese reasoning tasks. Performance compared with OpenAI-o3-mini

MetaStone‑S1 is trained based on our proposed reflective generative form, which combines “Long-CoT Reinforcement Learning” and “Process Reward Learning” into a unified training form. This form enables a single model to simultaneously achieve deep reasoning and high-quality reasoning trajectory selection. By sharing the backbone network between the PRMs and policy models, MetaStone‑S1 significantly reduces the inference cost of PRMs by 99%, resulting in faster and higher-quality responses.

Introduction

This repo contains the training and evaluation code of MetaStone-S1. For full details please refer to our paper and our official website.

Performance

Model AIME24 AIME25 LiveCodeBench C-EVAL
DeepScaleR-1.5B-Preview 43.1 30.0 - -
R1-Distill-Qwen-1.5B 28.9 22.8 16.9 27.1
R1-Distill-Qwen-7B 55.5 - 37.6 -
R1-Distill-Llama-8B 50.4 - 39.6 -
MetaStone-S1-7B-low 60.7 45.4 41.7 55.1
MetaStone-S1-7B-medium 66.3 48.3 44.1 57.5
MetaStone-S1-7B-high 70.2 48.6 44.4 57.8

Model

We save the parameters of the policy model and the SPRM head into two files:

  • "model.safetensors" is the checkpoint of the policy model.

  • "score_module.pt" is the checkpoint of the SPRM head.

You can find other sizes of MetaStone‑S1 below:

Model Transformers(HF) ModelScope
MetaStone-S1-1.5B MetaStone-S1-1.5B MetaStone-S1-1.5B
MetaStone-S1-7B MetaStone-S1-7B MetaStone-S1-7B
MetaStone-S1-32B MetaStone-S1-32B MetaStone-S1-32B

Training & Evaluation

Since Huggingface models do not directly support inference on SPRM. Please refer to our github repository for the detailed training and testing pipeline.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{wang2025testtimescalingreflectivegenerative,
 title={Test-Time Scaling with Reflective Generative Model}, 
 author={Zixiao Wang and Yuxin Wang and Xiaorui Wang and Mengting Xing and Jie Gao and Jianjun Xu and Guangcan Liu and Chenhui Jin and Zhuo Wang and Shengzhuo Zhang and Hongtao Xie},
 year={2025},
 eprint={2507.01951},
 archivePrefix={arXiv},
 primaryClass={cs.LG},
 url={https://arxiv.org/abs/2507.01951}, 
}
Downloads last month
12
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MetaStoneTec/MetaStone-S1-7B

Quantizations
3 models

Collection including MetaStoneTec/MetaStone-S1-7B