Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,9 @@ base_model:
|
|
30 |
</div>
|
31 |
|
32 |
## Overview
|
33 |
-
Polaris is an open
|
|
|
|
|
34 |
|
35 |
## Polaris's Recipe
|
36 |
- **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
|
|
|
30 |
</div>
|
31 |
|
32 |
## Overview
|
33 |
+
Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
|
34 |
+
By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
|
35 |
+
|
36 |
|
37 |
## Polaris's Recipe
|
38 |
- **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
|