Sunshine279 commited on
Commit
a52f733
·
verified ·
1 Parent(s): 7a5bec9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -29
README.md CHANGED
@@ -1,47 +1,58 @@
1
  ---
2
- base_model: Sunshine279/gammaPO-llama-3-8b-instruct
3
  tags:
4
- - alignment-handbook
5
- - generated_from_trainer
6
- - arxiv:2506.03690
7
  datasets:
8
- - princeton-nlp/llama3-ultrafeedback-armorm
9
  model-index:
10
- - name: Sunshine279/gammaPO-llama-3-8b-instruct
11
- results: []
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](None)
18
- # llama-3-8b-it-gmsimpo-beta10-gm0.4-tau10-lr1e-6
19
 
20
- This model is a fine-tuned version of [Sunshine279/gammaPO-llama-3-8b-instruct](https://huggingface.co/Sunshine279/gammaPO-llama-3-8b-instruct) on the princeton-nlp/llama3-ultrafeedback-armorm dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 1.1389
23
- - Rewards/chosen: -20.8453
24
- - Rewards/rejected: -29.4063
25
- - Rewards/accuracies: 0.8679
26
- - Rewards/margins: 8.5610
27
- - Logps/rejected: -2.9406
28
- - Logps/chosen: -2.0845
29
- - Logits/rejected: -1.7197
30
- - Logits/chosen: -1.7101
31
 
32
- ## Model description
33
 
34
- More information needed
35
 
36
- ## Intended uses & limitations
37
 
38
- More information needed
39
 
40
- ## Training and evaluation data
41
 
42
- More information needed
43
 
44
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### Training hyperparameters
47
 
 
1
  ---
2
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
3
  tags:
4
+ - alignment-handbook
5
+ - generated_from_trainer
 
6
  datasets:
7
+ - princeton-nlp/llama3-ultrafeedback-armorm
8
  model-index:
9
+ - name: Sunshine279/gammaPO-llama-3-8b-instruct
10
+ results: []
11
+ license: mit
12
  ---
13
 
14
+ ## Model Details
 
15
 
16
+ ### Model Description
 
17
 
18
+ We fine-tuned [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on [princeton-nlp/llama3-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback-armorm) with the gamma-SimPO objective.
 
 
 
 
 
 
 
 
 
 
19
 
20
+ - Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
21
 
22
+ - Model type: Causal Language Model
23
 
24
+ - License: gemma
25
 
26
+ - Finetuned from model: google/gemma-2-9b-it
27
 
28
+ ### Model Sources
29
 
30
+ - Repository: https://github.com/sunjie279/gammaPO
31
 
32
+ - Paper: https://arxiv.org/pdf/2506.03690
33
+
34
+ ### How to Get Started with the Model
35
+
36
+ ```python
37
+ import torch
38
+ from transformers import pipeline
39
+
40
+ model_id = "Sunshine279/gammaPO-llama-3-8b-instruct"
41
+
42
+ generator = pipeline(
43
+ "text-generation",
44
+ model=model_id,
45
+ model_kwargs={"torch_dtype": torch.bfloat16},
46
+ device="cuda",
47
+ )
48
+ outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
49
+ do_sample=False,
50
+ eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
51
+ max_new_tokens=200)
52
+ print(outputs[0]['generated_text'])
53
+ ```
54
+
55
+ ## Training details
56
 
57
  ### Training hyperparameters
58