bethrezen commited on
Commit
3283fbf
·
verified ·
1 Parent(s): 476ef85

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,133 +1,58 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
  base_model: openai/gpt-oss-20b
 
 
5
  tags:
6
  - generated_from_trainer
7
- datasets:
8
- - HuggingFaceH4/Multilingual-Thinking
9
- model-index:
10
- - name: workspace/data/outputs/gpt-oss-out/
11
- results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.12.0.dev0`
21
- ```yaml
22
- base_model: openai/gpt-oss-20b
23
- use_kernels: true
24
- model_quantization_config: Mxfp4Config
25
- model_quantization_config_kwargs:
26
- dequantize: true
27
-
28
- plugins:
29
- - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
30
-
31
- experimental_skip_move_to_device: true # prevent OOM by NOT putting model to GPU before sharding
32
-
33
- datasets:
34
- - path: HuggingFaceH4/Multilingual-Thinking
35
- type: chat_template
36
- field_thinking: thinking
37
- template_thinking_key: thinking
38
-
39
- dataset_prepared_path: last_run_prepared
40
- val_set_size: 0
41
- output_dir: /workspace/data/outputs/gpt-oss-out/
42
-
43
- sequence_len: 8196
44
- sample_packing: true
45
- pad_to_sequence_len: true
46
-
47
- wandb_project: gpt-oss-20b
48
- wandb_name: multilingual-reasoning-fft
49
-
50
- gradient_accumulation_steps: 1
51
- micro_batch_size: 2
52
- num_epochs: 1
53
-
54
- optimizer: adamw_torch_fused
55
- lr_scheduler: constant_with_warmup
56
- learning_rate: 2e-5
57
-
58
- bf16: true
59
- tf32: true
60
 
61
- flash_attention: true
62
- attn_implementation: kernels-community/vllm-flash-attn3
63
 
64
- gradient_checkpointing: true
65
- #activation_offloading: true
66
 
67
- logging_steps: 1
68
- saves_per_epoch: 1
69
 
70
- warmup_ratio: 0.03
71
-
72
- special_tokens:
73
- eot_tokens:
74
- - "<|end|>"
75
- - "<|return|>"
76
-
77
- deepspeed: /pi-workspace/zero3.json
78
-
79
- # fsdp_version: 2
80
- # fsdp_config:
81
- # offload_params: false
82
- # state_dict_type: SHARDED_STATE_DICT
83
- # auto_wrap_policy: TRANSFORMER_BASED_WRAP
84
- # transformer_layer_cls_to_wrap: GptOssDecoderLayer
85
- # reshard_after_forward: true
86
- # # cpu_ram_efficient_loading: true
87
  ```
88
 
89
- </details><br>
90
-
91
- # workspace/data/outputs/gpt-oss-out/
92
-
93
- This model is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) on the HuggingFaceH4/Multilingual-Thinking dataset.
94
-
95
- ## Model description
96
-
97
- More information needed
98
-
99
- ## Intended uses & limitations
100
-
101
- More information needed
102
 
103
- ## Training and evaluation data
104
 
105
- More information needed
106
 
107
- ## Training procedure
108
 
109
- ### Training hyperparameters
110
-
111
- The following hyperparameters were used during training:
112
- - learning_rate: 2e-05
113
- - train_batch_size: 2
114
- - eval_batch_size: 2
115
- - seed: 42
116
- - distributed_type: multi-GPU
117
- - num_devices: 8
118
- - total_train_batch_size: 16
119
- - total_eval_batch_size: 16
120
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
121
- - lr_scheduler_type: constant_with_warmup
122
- - training_steps: 8
123
 
124
- ### Training results
 
 
 
 
125
 
 
126
 
127
 
128
- ### Framework versions
129
 
130
- - Transformers 4.55.0
131
- - Pytorch 2.8.0+cu128
132
- - Datasets 4.0.0
133
- - Tokenizers 0.21.4
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: openai/gpt-oss-20b
3
+ library_name: transformers
4
+ model_name: gpt-oss-20b-multilingual-reasoner
5
  tags:
6
  - generated_from_trainer
7
+ - trl
8
+ - sft
9
+ licence: license
 
 
10
  ---
11
 
12
+ # Model Card for gpt-oss-20b-multilingual-reasoner
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ This model is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
+ ## Quick start
 
18
 
19
+ ```python
20
+ from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="None", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
+ ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/b37h3z3n/huggingface/runs/4059q72j)
31
 
 
32
 
33
+ This model was trained with SFT.
34
 
35
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ - TRL: 0.21.0
38
+ - Transformers: 4.55.0
39
+ - Pytorch: 2.8.0+cu128
40
+ - Datasets: 4.0.0
41
+ - Tokenizers: 0.21.4
42
 
43
+ ## Citations
44
 
45
 
 
46
 
47
+ Cite TRL as:
48
+
49
+ ```bibtex
50
+ @misc{vonwerra2022trl,
51
+ title = {{TRL: Transformer Reinforcement Learning}},
52
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
53
+ year = 2020,
54
+ journal = {GitHub repository},
55
+ publisher = {GitHub},
56
+ howpublished = {\url{https://github.com/huggingface/trl}}
57
+ }
58
+ ```
model-00001-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e66b52040a9a0777a9c94ad800d7dff662a58c96129efbfd77b54ef553560bd
3
  size 4504304664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1ef9a81309d602c303a1cb43b568d9e4a2faf761cb676b62af26182511e1452
3
  size 4504304664
model-00002-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9cbe12bb05ec8772ff87808f8916a52f9c330075813be3a978e6b968ba2aa52b
3
  size 4939127656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56dd70d969c9f77ce0b7e79424919d09db0882cb22a2ea7fcd16a7c7267e6530
3
  size 4939127656
model-00003-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:77057a058259fdad6458a2cc253c248a9a308c43390b2e1eaeff108144d1502e
3
  size 4939127656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a535b6916849716c429e46d9e8c190ba7c49e54b48bd070416fb42ff3e0c3129
3
  size 4939127656
model-00004-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:72cc1b6291eec305ae8b01a4f7ab53eb15489496d4fe36b7613d822cf307ccf2
3
  size 4939127680
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec38b8349e47cee4a7fc123845e10dab215b123ed15a7090465c93558ebcf158
3
  size 4939127680
model-00005-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7c0775267ff51f292d952a8a06d7d7a4cb709ea9fa48433f5dde5dc1ead0d84f
3
  size 4939127704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac94173d30a5de5a17382c79bf0cabef29c1d34d27fba3128e701ba9aa012e4c
3
  size 4939127704
model-00006-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:89e821b90576a93e94a9457d21012a184e3d3336cd8a499b5040dcc9fa8791c8
3
  size 4939127704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ccc09d6d016bef393bfb1f024a448a9a5b51db9d15dec718ba3af819288efb1
3
  size 4939127704
model-00007-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3f2b0083866777b6ef7405acca40a46242005bbebb423f09bd3223880cc94ee8
3
  size 4939127704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c9ca478c80cd187756bae63154e5afec330448bd59c6969937bbcba8ad2ed7b
3
  size 4939127704
model-00008-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b97d39f740e42cf0bbe9e15a6ad59dae1b93396ca2d85899c4bb9d0b6882192
3
  size 4939127704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:069f7d27321f545ede37d5716ce4d2dd1aa6f21728a81a9c5a499a76d789d893
3
  size 4939127704
model-00009-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:549eb052e40666b8b016c7a6db7031e752785cec977beed6a315fea12489caef
3
  size 2751362856
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d0722a3185a07dabe2c008116e476336449ba7606dc64e7cc8a4435c508c7e5
3
  size 2751362856
model.safetensors.index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "metadata": {
3
- "total_parameters": 335424,
4
  "total_size": 41829514368
5
  },
6
  "weight_map": {
 
1
  {
2
  "metadata": {
3
+ "total_parameters": 4759104,
4
  "total_size": 41829514368
5
  },
6
  "weight_map": {
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:29e1c313582c8dd3b40fa45dcf6d6482aeabf058adc5837643ba6a5b2ecdb37c
3
- size 9489
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a98f69af00540d86783a5d39f060a49f94d8d9d804afba9346844a3a419e3ca
3
+ size 7569