|
--- |
|
license: apache-2.0 |
|
tags: |
|
- dpo |
|
base_model: |
|
- CorticalStack/pastiche-crown-clown-7b-dare |
|
dataset: |
|
- jondurbin/truthy-dpo-v0.1 |
|
--- |
|
|
|
<img src="pastiche-crown-clown.png" alt="Pastiche crown clown logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
# CorticalStack/pastiche-crown-clown-7b-dare-dpo |
|
|
|
CorticalStack/pastiche-crown-clown-7b-dare-dpo is a DPO fine-tuned version of [CorticalStack/pastiche-crown-clown-7b-dare](https://huggingface.co/CorticalStack/pastiche-crown-clown-7b-dare) using the [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) dataset. |
|
|
|
### LoRA |
|
- r: 16 |
|
- LoRA alpha: 16 |
|
- LoRA dropout: 0.05 |
|
|
|
### Training arguments |
|
- Batch size: 4 |
|
- Gradient accumulation steps: 4 |
|
- Optimizer: paged_adamw_32bit |
|
- Max steps: 200 |
|
- Learning rate: 5e-05 |
|
- Learning rate scheduler type: cosine |
|
- Beta: 0.1 |
|
- Max prompt length: 1024 |
|
- Max length: 1536 |