Update README.md
Browse files
README.md
CHANGED
@@ -40,8 +40,8 @@ Polaris is an open-source post-training method that uses reinforcement learning
|
|
40 |
- **Inference-Time Length:** Polaris incorporates length extrapolation techniques for generating longer CoT at inference stage. This enables a *"train-short, generate-long"* paradigm for CoT reasoning, mitigating the computational burden of training with excessively long rollouts .
|
41 |
- **Exploration Efficiency:** Exploration efficiency in Polaris is enhanced through multi-stage training. However, reducing the model's response length in the first stage poses potential risks. A more conservative approach would be to directly allow the model to "think longer" from the beginning.
|
42 |
|
43 |
-
The details of our training recipe and analysis can be found in our [blog post]().
|
44 |
-
The code and data for reproducing our results can be found in our [github repo]().
|
45 |
|
46 |
### Evaluation Results
|
47 |
|
|
|
40 |
- **Inference-Time Length:** Polaris incorporates length extrapolation techniques for generating longer CoT at inference stage. This enables a *"train-short, generate-long"* paradigm for CoT reasoning, mitigating the computational burden of training with excessively long rollouts .
|
41 |
- **Exploration Efficiency:** Exploration efficiency in Polaris is enhanced through multi-stage training. However, reducing the model's response length in the first stage poses potential risks. A more conservative approach would be to directly allow the model to "think longer" from the beginning.
|
42 |
|
43 |
+
The details of our training recipe and analysis can be found in our [blog post](https://hkunlp.github.io/blog/2025/Polaris).
|
44 |
+
The code and data for reproducing our results can be found in our [github repo](https://github.com/ChenxinAn-fdu/POLARIS).
|
45 |
|
46 |
### Evaluation Results
|
47 |
|