sslu commited on
Commit
43f1c77
·
verified ·
1 Parent(s): fb5d60f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -27,7 +27,7 @@ The InfiAlign framework offers multiple variants tailored for different alignmen
27
 
28
  * **[InfiAlign-Qwen-7B-SFT](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT)**: Fine-tuned using curriculum-style instruction data.
29
  * **[InfiAlign-Qwen-7B-DPO](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-DPO)**: Trained with Direct Preference Optimization (DPO) to improve reasoning alignment. **\[You are here!]**
30
- * **[InfiAlign-Qwen-7B-R1](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-R1)**: Reinforcement learning variant (GRPO) for further refinement.
31
 
32
  ## 📋 Model Description
33
 
 
27
 
28
  * **[InfiAlign-Qwen-7B-SFT](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-SFT)**: Fine-tuned using curriculum-style instruction data.
29
  * **[InfiAlign-Qwen-7B-DPO](https://huggingface.co/InfiX-ai/InfiAlign-Qwen-7B-DPO)**: Trained with Direct Preference Optimization (DPO) to improve reasoning alignment. **\[You are here!]**
30
+ * **[InfiAlign-Qwen-7B-R1](# "Stay tuned")**: Reinforcement learning variant (GRPO) for further refinement.
31
 
32
  ## 📋 Model Description
33