Sunshine279 commited on
Commit
0e99e25
·
verified ·
1 Parent(s): c5c1ce9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -30
README.md CHANGED
@@ -1,47 +1,58 @@
1
  ---
2
- base_model: Sunshine279/gammaPO-gemma-2-9b-it
3
  tags:
4
- - alignment-handbook
5
- - generated_from_trainer
6
- - arxiv:2506.03690
7
  datasets:
8
- - princeton-nlp/gemma2-ultrafeedback-armorm
9
  model-index:
10
- - name: gemma-2-9b-it-gmsimpo-beta10-gm0.5-tau20-lr8e-7
11
- results: []
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](None)
18
- # gemma-2-9b-it-gmsimpo-beta10-gm0.5-tau20-lr8e-7
19
 
20
- This model is a fine-tuned version of [Sunshine279/gammaPO-gemma-2-9b-it](https://huggingface.co/Sunshine279/gammaPO-gemma-2-9b-it) on the princeton-nlp/gemma2-ultrafeedback-armorm dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 2.5622
23
- - Rewards/chosen: -18.1350
24
- - Rewards/rejected: -23.0307
25
- - Rewards/accuracies: 0.7828
26
- - Rewards/margins: 4.8958
27
- - Logps/rejected: -2.3031
28
- - Logps/chosen: -1.8135
29
- - Logits/rejected: -15.8316
30
- - Logits/chosen: -15.8114
31
 
32
- ## Model description
33
 
34
- More information needed
35
 
36
- ## Intended uses & limitations
37
 
38
- More information needed
39
 
40
- ## Training and evaluation data
41
 
42
- More information needed
43
 
44
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ### Training hyperparameters
47
 
@@ -72,4 +83,4 @@ The following hyperparameters were used during training:
72
  - Transformers 4.42.4
73
  - Pytorch 2.3.0+cu121
74
  - Datasets 2.20.0
75
- - Tokenizers 0.19.1
 
1
  ---
2
+ base_model: google/gemma-2-9b-it
3
  tags:
4
+ - alignment-handbook
5
+ - generated_from_trainer
 
6
  datasets:
7
+ - princeton-nlp/gemma2-ultrafeedback-armorm
8
  model-index:
9
+ - name: Sunshine279/gammaPO-gemma-2-9b-it
10
+ results: []
11
+ license: mit
12
  ---
13
 
14
+ ## Model Details
 
15
 
16
+ ### Model Description
 
17
 
18
+ We fine-tuned google/gemma-2-9b-it on princeton-nlp/gemma2-ultrafeedback-armorm with the $\gamma$-SimPO objective.
 
 
 
 
 
 
 
 
 
 
19
 
20
+ - Developed by: Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang
21
 
22
+ - Model type: Causal Language Model
23
 
24
+ - License: gemma
25
 
26
+ - Finetuned from model: google/gemma-2-9b-it
27
 
28
+ ### Model Sources
29
 
30
+ - Repository: https://github.com/sunjie279/gammaPO
31
 
32
+ - Paper: https://arxiv.org/pdf/2506.03690
33
+
34
+ ### How to Get Started with the Model
35
+
36
+ ```python
37
+ import torch
38
+ from transformers import pipeline
39
+
40
+ model_id = "Sunshine279/gammaPO-gemma-2-9b-it"
41
+
42
+ generator = pipeline(
43
+ "text-generation",
44
+ model=model_id,
45
+ model_kwargs={"torch_dtype": torch.bfloat16},
46
+ device="cuda",
47
+ )
48
+ outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
49
+ do_sample=False,
50
+ eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
51
+ max_new_tokens=200)
52
+ print(outputs[0]['generated_text'])
53
+ ```
54
+
55
+ ## Training detail
56
 
57
  ### Training hyperparameters
58
 
 
83
  - Transformers 4.42.4
84
  - Pytorch 2.3.0+cu121
85
  - Datasets 2.20.0
86
+ - Tokenizers 0.19.1