POLARIS-Project
/

Polaris-7B-Preview

Model card Files Files and versions

Chancy commited on 7 days ago

Commit

b1a1246

·

verified ·

1 Parent(s): d360f7f

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -30,7 +30,9 @@ base_model:
 </div>
 ## Overview
-Polaris is an open‐source post‐training method that applies reinforcement learning (RL) to scale up models that already exhibit strong reasoning abilities. Our approach demonstrates that even a 4B model (such as [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)) can achieve incredible improvements on complex reasoning tasks. In our experiments, Polaris-4B-Preview obtains remarkable results on challenging benchmarks, significantly outperforming several leading commercial systems like Claude‑4‑Opus and Grok‑3‑Beta.
 ## Polaris's Recipe
 - **Data Difficulty:**  Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.

 </div>
 ## Overview
+Polaris is an open-source post-training method that uses reinforcement learning (RL) scaling to refine and enhance models with advanced reasoning abilities. Our research shows that even top-tier models like Qwen3-4B can achieve significant improvements on challenging reasoning tasks when optimized with Polaris.
+ By leveraging open-source data and academic-level resources, Polaris pushes the capabilities of open-recipe reasoning models to unprecedented heights. In benchmark tests, our method even surpasses top commercial systems, including Claude-4-Opus, Grok-3-Beta, and o3-mini-high (2025/01/03).
 ## Polaris's Recipe
 - **Data Difficulty:**  Before training, Polaris analyzes and maps the distribution of data difficulty. The dataset should not be overwhelmed by either overly difficult or trivially easy problems. We recommend using a data distribution with a slight bias toward challenging problems, which typically exhibits a mirrored J-shaped distribution.