Sunshine279
/

gammaPO-llama-3-8b-instruct

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

Sunshine279 commited on 28 days ago

Commit

a52f733

·

verified ·

1 Parent(s): 7a5bec9

Update README.md

Files changed (1) hide show

README.md +40 -29

README.md CHANGED Viewed

@@ -1,47 +1,58 @@
 ---
-base_model: Sunshine279/gammaPO-llama-3-8b-instruct
 tags:
-- alignment-handbook
-- generated_from_trainer
-- arxiv:2506.03690
 datasets:
-- princeton-nlp/llama3-ultrafeedback-armorm
 model-index:
-- name: Sunshine279/gammaPO-llama-3-8b-instruct
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](None)
-# llama-3-8b-it-gmsimpo-beta10-gm0.4-tau10-lr1e-6
-This model is a fine-tuned version of [Sunshine279/gammaPO-llama-3-8b-instruct](https://huggingface.co/Sunshine279/gammaPO-llama-3-8b-instruct) on the princeton-nlp/llama3-ultrafeedback-armorm dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.1389
-- Rewards/chosen: -20.8453
-- Rewards/rejected: -29.4063
-- Rewards/accuracies: 0.8679
-- Rewards/margins: 8.5610
-- Logps/rejected: -2.9406
-- Logps/chosen: -2.0845
-- Logits/rejected: -1.7197
-- Logits/chosen: -1.7101
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters

 ---
+base_model: meta-llama/Meta-Llama-3-8B-Instruct
 tags:
+  - alignment-handbook
+  - generated_from_trainer
 datasets:
+  - princeton-nlp/llama3-ultrafeedback-armorm
 model-index:
+  - name: Sunshine279/gammaPO-llama-3-8b-instruct
+    results: []
+license: mit
 ---
+## Model Details
+### Model Description
+We fine-tuned [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on [princeton-nlp/llama3-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback-armorm) with the gamma-SimPO objective.
+ - Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
+ - Model type: Causal Language Model
+ - License: gemma
+ - Finetuned from model: google/gemma-2-9b-it
+### Model Sources
+ - Repository: https://github.com/sunjie279/gammaPO
+ - Paper: https://arxiv.org/pdf/2506.03690
+### How to Get Started with the Model
+```python
+import torch
+from transformers import pipeline
+model_id = "Sunshine279/gammaPO-llama-3-8b-instruct"
+generator = pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device="cuda",
+)
+outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
+                      do_sample=False,
+                      eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
+                      max_new_tokens=200)
+print(outputs[0]['generated_text'])
+```
+## Training details
 ### Training hyperparameters