hardlyworking
/

4Bkto

@@ -1,144 +1,68 @@
 ---
 library_name: transformers
-license: cc-by-nc-4.0
-base_model: hardlyworking/4Brp
 tags:
-- axolotl
 - generated_from_trainer
-datasets:
-- PocketDoc/Dans-Prosemaxx-RepRemover-1
-model-index:
-- name: 4Brepremover
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.11.0.dev0`
-```yaml
-base_model: hardlyworking/4Brp
-load_in_8bit: false
-load_in_4bit: false
-strict: false
-datasets:
-  - path: PocketDoc/Dans-Prosemaxx-RepRemover-1
-    type: dan-chat-advanced
-val_set_size: 0
-output_dir: ./outputs/out
-dataset_prepared_path: last_run_prepared
-shuffle_merged_datasets: true
-hub_model_id: hardlyworking/4Brepremover
-hub_strategy: "all_checkpoints"
-push_dataset_to_hub:
-hf_use_auth_token: true
-plugins:
-  - axolotl.integrations.liger.LigerPlugin
-  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
-liger_rope: true
-liger_rms_norm: true
-liger_layer_norm: true
-liger_glu_activation: true
-liger_fused_linear_cross_entropy: false
-cut_cross_entropy: true
-sequence_len: 32768
-sample_packing: true
-eval_sample_packing: true
-pad_to_sequence_len: true
-wandb_project: new4B
-wandb_entity:
-wandb_watch:
-wandb_name: new4Brep
-wandb_log_model:
-evals_per_epoch:
-eval_table_size:
-eval_max_new_tokens:
-gradient_accumulation_steps: 1
-micro_batch_size: 8
-num_epochs: 3
-optimizer: adamw_bnb_8bit
-lr_scheduler: cosine
-learning_rate: 1e-5
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: offload
-gradient_checkpointing_kwargs:
-  use_reentrant: false
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-s2_attention:
-deepspeed:
-warmup_ratio: 0.05
-saves_per_epoch: 1
-debug:
-weight_decay: 0.01
-fsdp:
-fsdp_config:
-special_tokens:
-   pad_token: <|endoftext|>
 ```
-</details><br>
-# 4Brepremover
-This model is a fine-tuned version of [hardlyworking/4Brp](https://huggingface.co/hardlyworking/4Brp) on the PocketDoc/Dans-Prosemaxx-RepRemover-1 dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 8
-- training_steps: 174
-### Training results
-### Framework versions
-- Transformers 4.53.1
-- Pytorch 2.6.0+cu126
-- Datasets 3.6.0
-- Tokenizers 0.21.2

 ---
+base_model: hardlyworking/4Brepremover
 library_name: transformers
+model_name: 4Bkto
 tags:
 - generated_from_trainer
+- axolotl
+- trl
+- kto
+licence: license
 ---
+# Model Card for 4Bkto
+This model is a fine-tuned version of [hardlyworking/4Brepremover](https://huggingface.co/hardlyworking/4Brepremover).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="hardlyworking/4Bkto", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
 ```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/welchjacob254/New4B/runs/lw2hq72m)
+This model was trained with KTO, a method introduced in [KTO: Model Alignment as Prospect Theoretic Optimization](https://huggingface.co/papers/2402.01306).
+### Framework versions
+- TRL: 0.18.2
+- Transformers: 4.53.1
+- Pytorch: 2.6.0+cu126
+- Datasets: 3.6.0
+- Tokenizers: 0.21.2
+## Citations
+Cite KTO as:
+```bibtex
+@article{ethayarajh2024kto,
+    title        = {{KTO: Model Alignment as Prospect Theoretic Optimization}},
+    author       = {Kawin Ethayarajh and Winnie Xu and Niklas Muennighoff and Dan Jurafsky and Douwe Kiela},
+    year         = 2024,
+    eprint       = {arXiv:2402.01306},
+}
+```
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```