Update README.md
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ The code and data for reproducing our results can be found in our [github repo](
|
|
60 |
| **`POLARIS-4B-Preview`** | **81.2** | **79.4** | **44.0** | **69.1** | **94.8** |
|
61 |
|
62 |
## Acknowledgements
|
63 |
-
The training and evaluation codebase is heavily built on [Verl](https://github.com/volcengine/verl). The reward function in polaris
|
64 |
|
65 |
|
66 |
## Citation
|
|
|
60 |
| **`POLARIS-4B-Preview`** | **81.2** | **79.4** | **44.0** | **69.1** | **94.8** |
|
61 |
|
62 |
## Acknowledgements
|
63 |
+
The training and evaluation codebase is heavily built on [Verl](https://github.com/volcengine/verl). The reward function in polaris is from [DeepScaleR](https://github.com/agentica-project/rllm). Our model is trained on top of [`Qwen3-4B`](https://huggingface.co/Qwen/Qwen3-4B) and [`DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B). Thanks for their wonderful work.
|
64 |
|
65 |
|
66 |
## Citation
|