Chancy commited on
Commit
b1a1246
·
verified ·
1 Parent(s): d360f7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -30,7 +30,9 @@ base_model:
30
  </div>
31
 
32
  ## Overview
33
- Polaris is an opensource posttraining method that applies reinforcement learning (RL) to scale up models that already exhibit strong reasoning abilities. Our approach demonstrates that even a 4B model (such as [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)) can achieve incredible improvements on complex reasoning tasks. In our experiments, Polaris-4B-Preview obtains remarkable results on challenging benchmarks, significantly outperforming several leading commercial systems like Claude‑4‑Opus and Grok‑3‑Beta.
 
 
34
 
35
  ## Polaris's Recipe
36
  - **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.
 
30
  </div>
31
 
32
  ## Overview
33
+ Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
34
+ By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
35
+
36
 
37
  ## Polaris's Recipe
38
  - **Data Difficulty:** Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.