glorgao
/

SelectiveDPO-Mistral-7B-SFT-UFBinarized

Text Generation

text-generation-inference

Model card Files Files and versions Community

glorgao commited on May 15

Commit

8077e6a

·

verified ·

1 Parent(s): a1e58b8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 base_model:
 - HuggingFaceH4/mistral-7b-sft-beta
 ---
-This model is fine-tuned from the HuggingFaceH4/mistral-7b-sft-beta model using the SelectiveDPO algorithm on the Ultrafeedback_binarized dataset.
 For the recipe to reproduce this model, please visit our [GitHub page](https://github.com/glorgao/SelectiveDPO).

 base_model:
 - HuggingFaceH4/mistral-7b-sft-beta
 ---
+This model is fine-tuned from the HuggingFaceH4/mistral-7b-sft-beta model using [SelectiveDPO](https://huggingface.co/papers/2502.09650) on the Ultrafeedback_binarized dataset.
 For the recipe to reproduce this model, please visit our [GitHub page](https://github.com/glorgao/SelectiveDPO).