YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is WSPO-trained using the DeepSeek-R1-Qwen-Distill-7B on the Openthought-220 long thought dataset with the help of DeepSeek-R1-Qwen-Distill-1.5B and DeepScaleR-1.5B-Preview

img

Citation

@inproceedings{
zhu2025weaktostrong,
title={Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model},
author={Wenhong Zhu and Zhiwei He and Xiaofeng Wang and Pengfei Liu and Rui Wang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=f7KxfUrRSb}
}
Downloads last month
8
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wh-zhu/DeepScaleR-7B-WSPO

Quantizations
2 models

Collection including wh-zhu/DeepScaleR-7B-WSPO