Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ The InfiAlign framework offers multiple variants tailored for different alignmen
|
|
27 |
|
28 |
* **[InfiAlign-Qwen-7B-SFT](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT)**: Fine-tuned using curriculum-style instruction data.
|
29 |
* **[InfiAlign-Qwen-7B-DPO](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-DPO)**: Trained with Direct Preference Optimization (DPO) to improve reasoning alignment. **\[You are here!]**
|
30 |
-
* **[InfiAlign-Qwen-7B-R1](
|
31 |
|
32 |
## 📋 Model Description
|
33 |
|
|
|
27 |
|
28 |
* **[InfiAlign-Qwen-7B-SFT](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT)**: Fine-tuned using curriculum-style instruction data.
|
29 |
* **[InfiAlign-Qwen-7B-DPO](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-DPO)**: Trained with Direct Preference Optimization (DPO) to improve reasoning alignment. **\[You are here!]**
|
30 |
+
* **[InfiAlign-Qwen-7B-R1](# "Stay tuned")**: Reinforcement learning variant (GRPO) for further refinement.
|
31 |
|
32 |
## 📋 Model Description
|
33 |
|