hardlyworking commited on
Commit
78a8b51
·
verified ·
1 Parent(s): cf5d313

End of training

Browse files
Files changed (1) hide show
  1. README.md +144 -0
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: cc-by-nc-4.0
4
+ base_model: hardlyworking/4Brp
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - PocketDoc/Dans-Prosemaxx-RepRemover-1
10
+ model-index:
11
+ - name: 4Brepremover
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.11.0.dev0`
22
+ ```yaml
23
+ base_model: hardlyworking/4Brp
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: false
27
+ strict: false
28
+
29
+ datasets:
30
+ - path: PocketDoc/Dans-Prosemaxx-RepRemover-1
31
+ type: dan-chat-advanced
32
+ val_set_size: 0
33
+ output_dir: ./outputs/out
34
+ dataset_prepared_path: last_run_prepared
35
+ shuffle_merged_datasets: true
36
+
37
+ hub_model_id: hardlyworking/4Brepremover
38
+ hub_strategy: "all_checkpoints"
39
+ push_dataset_to_hub:
40
+ hf_use_auth_token: true
41
+
42
+ plugins:
43
+ - axolotl.integrations.liger.LigerPlugin
44
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
45
+ liger_rope: true
46
+ liger_rms_norm: true
47
+ liger_layer_norm: true
48
+ liger_glu_activation: true
49
+ liger_fused_linear_cross_entropy: false
50
+ cut_cross_entropy: true
51
+
52
+ sequence_len: 32768
53
+ sample_packing: true
54
+ eval_sample_packing: true
55
+ pad_to_sequence_len: true
56
+
57
+ wandb_project: new4B
58
+ wandb_entity:
59
+ wandb_watch:
60
+ wandb_name: new4Brep
61
+ wandb_log_model:
62
+
63
+ evals_per_epoch:
64
+ eval_table_size:
65
+ eval_max_new_tokens:
66
+
67
+ gradient_accumulation_steps: 1
68
+ micro_batch_size: 8
69
+ num_epochs: 3
70
+ optimizer: adamw_bnb_8bit
71
+ lr_scheduler: cosine
72
+ learning_rate: 1e-5
73
+
74
+ train_on_inputs: false
75
+ group_by_length: false
76
+ bf16: auto
77
+ fp16:
78
+ tf32: false
79
+
80
+ gradient_checkpointing: offload
81
+ gradient_checkpointing_kwargs:
82
+ use_reentrant: false
83
+ early_stopping_patience:
84
+ resume_from_checkpoint:
85
+ local_rank:
86
+ logging_steps: 1
87
+ xformers_attention:
88
+ flash_attention: true
89
+ s2_attention:
90
+
91
+ deepspeed:
92
+
93
+ warmup_ratio: 0.05
94
+ saves_per_epoch: 1
95
+ debug:
96
+ weight_decay: 0.01
97
+ fsdp:
98
+ fsdp_config:
99
+ special_tokens:
100
+ pad_token: <|endoftext|>
101
+ ```
102
+
103
+ </details><br>
104
+
105
+ # 4Brepremover
106
+
107
+ This model is a fine-tuned version of [hardlyworking/4Brp](https://huggingface.co/hardlyworking/4Brp) on the PocketDoc/Dans-Prosemaxx-RepRemover-1 dataset.
108
+
109
+ ## Model description
110
+
111
+ More information needed
112
+
113
+ ## Intended uses & limitations
114
+
115
+ More information needed
116
+
117
+ ## Training and evaluation data
118
+
119
+ More information needed
120
+
121
+ ## Training procedure
122
+
123
+ ### Training hyperparameters
124
+
125
+ The following hyperparameters were used during training:
126
+ - learning_rate: 1e-05
127
+ - train_batch_size: 8
128
+ - eval_batch_size: 8
129
+ - seed: 42
130
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
+ - lr_scheduler_type: cosine
132
+ - lr_scheduler_warmup_steps: 8
133
+ - training_steps: 174
134
+
135
+ ### Training results
136
+
137
+
138
+
139
+ ### Framework versions
140
+
141
+ - Transformers 4.53.1
142
+ - Pytorch 2.6.0+cu126
143
+ - Datasets 3.6.0
144
+ - Tokenizers 0.21.2