--- license: apache-2.0 library_name: transformers base_model: - nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated datasets: - nbeerbower/Schule-DPO - nbeerbower/Arkhaios-DPO - nbeerbower/Purpura-DPO --- ![image/png](https://huggingface.co/nbeerbower/mistral-nemo-kartoffel-12B/resolve/main/kartoffel.png?download=true) # mistral-nemo-kartoffel-12B [Mahou-1.5-mistral-nemo-12B-lorablated](https://huggingface.co/nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated) finetuned on various datasets. ### Method [ORPO tuned](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) with 8x A100 for 2 epochs. QLoRA config: ``` # QLoRA config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch_dtype, bnb_4bit_use_double_quant=True, ) # LoRA config peft_config = LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj'] ) ``` Training config: ``` orpo_args = ORPOConfig( run_name=new_model, learning_rate=8e-6, lr_scheduler_type="linear", max_length=2048, max_prompt_length=1024, max_completion_length=1024, beta=0.1, per_device_train_batch_size=4, per_device_eval_batch_size=4, gradient_accumulation_steps=1, optim="paged_adamw_8bit", num_train_epochs=2, evaluation_strategy="steps", eval_steps=0.2, logging_steps=1, warmup_steps=10, max_grad_norm=10, report_to="wandb", output_dir="./results/", bf16=True, gradient_checkpointing=True, ) ```