Sunshine279
/

gammaPO-gemma-2-9b-it

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

Sunshine279 commited on 28 days ago

Commit

0e99e25

·

verified ·

1 Parent(s): c5c1ce9

Update README.md

Files changed (1) hide show

README.md +41 -30

README.md CHANGED Viewed

@@ -1,47 +1,58 @@
 ---
-base_model: Sunshine279/gammaPO-gemma-2-9b-it
 tags:
-- alignment-handbook
-- generated_from_trainer
-- arxiv:2506.03690
 datasets:
-- princeton-nlp/gemma2-ultrafeedback-armorm
 model-index:
-- name: gemma-2-9b-it-gmsimpo-beta10-gm0.5-tau20-lr8e-7
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](None)
-# gemma-2-9b-it-gmsimpo-beta10-gm0.5-tau20-lr8e-7
-This model is a fine-tuned version of [Sunshine279/gammaPO-gemma-2-9b-it](https://huggingface.co/Sunshine279/gammaPO-gemma-2-9b-it) on the princeton-nlp/gemma2-ultrafeedback-armorm dataset.
-It achieves the following results on the evaluation set:
-- Loss: 2.5622
-- Rewards/chosen: -18.1350
-- Rewards/rejected: -23.0307
-- Rewards/accuracies: 0.7828
-- Rewards/margins: 4.8958
-- Logps/rejected: -2.3031
-- Logps/chosen: -1.8135
-- Logits/rejected: -15.8316
-- Logits/chosen: -15.8114
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -72,4 +83,4 @@ The following hyperparameters were used during training:
 - Transformers 4.42.4
 - Pytorch 2.3.0+cu121
 - Datasets 2.20.0
-- Tokenizers 0.19.1

 ---
+base_model: google/gemma-2-9b-it
 tags:
+  - alignment-handbook
+  - generated_from_trainer
 datasets:
+  - princeton-nlp/gemma2-ultrafeedback-armorm
 model-index:
+  - name: Sunshine279/gammaPO-gemma-2-9b-it
+    results: []
+license: mit
 ---
+## Model Details
+### Model Description
+We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the $\gamma$-SimPO objective.
+ - Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
+ - Model type: Causal Language Model
+ - License: gemma
+ - Finetuned from model: google/gemma-2-9b-it
+### Model Sources
+ - Repository: https://github.com/sunjie279/gammaPO
+ - Paper: https://arxiv.org/pdf/2506.03690
+### How to Get Started with the Model
+```python
+import torch
+from transformers import pipeline
+model_id = "Sunshine279/gammaPO-gemma-2-9b-it"
+generator = pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device="cuda",
+)
+outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
+                      do_sample=False,
+                      eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
+                      max_new_tokens=200)
+print(outputs[0]['generated_text'])
+```
+## Training detail
 ### Training hyperparameters
 - Transformers 4.42.4
 - Pytorch 2.3.0+cu121
 - Datasets 2.20.0
+- Tokenizers 0.19.1