tsessk commited on
Commit
6ec1942
·
verified ·
1 Parent(s): 773a1e3

Model save

Browse files
Files changed (2) hide show
  1. README.md +4 -4
  2. generation_config.json +1 -1
README.md CHANGED
@@ -27,7 +27,7 @@ print(output["generated_text"])
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/tsessk/SmolLM2-alignment/runs/d16rjtrf)
31
 
32
 
33
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
@@ -35,10 +35,10 @@ This model was trained with DPO, a method introduced in [Direct Preference Optim
35
  ### Framework versions
36
 
37
  - TRL: 0.17.0
38
- - Transformers: 4.51.3
39
- - Pytorch: 2.6.0+cu124
40
  - Datasets: 3.6.0
41
- - Tokenizers: 0.21.1
42
 
43
  ## Citations
44
 
 
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/tsessk/SmolLM2-alignment/runs/95alwlls)
31
 
32
 
33
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
 
35
  ### Framework versions
36
 
37
  - TRL: 0.17.0
38
+ - Transformers: 4.48.3
39
+ - Pytorch: 2.5.1+cu124
40
  - Datasets: 3.6.0
41
+ - Tokenizers: 0.21.0
42
 
43
  ## Citations
44
 
generation_config.json CHANGED
@@ -3,5 +3,5 @@
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
  "pad_token_id": 2,
6
- "transformers_version": "4.51.3"
7
  }
 
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
  "pad_token_id": 2,
6
+ "transformers_version": "4.48.3"
7
  }