See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Jennny/direct_label_rolls
    conversation: qwen-7b-chat
    type: sharegpt
    split: "train"
    train_on_split: "train"

warmup_ratio: 0.05
val_set_size: 0.0
output_dir: ./prm
wandb_project: preference-models
# wandb_entity: domain-generalization
wandb_watch:
wandb_name: "qwen-7b-bs32_lr2e-6_prm"
wandb_log_model:

train_on_inputs: false

save_safetensors: true
#noisy_embedding_alpha: 10.0 # default for sharegpt type
dataset_prepared_path: ~/data/preference-models/last_run_prepared

dataset_processes: 48
#torch_compile: true
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

trust_remote_code: True
adapter:
lora_model_dir:
#lora_r: 32
#lora_alpha: 16
#lora_dropout: 0.05
#lora_target_linear: true
#lora_fan_in_fan_out:

gradient_checkpointing: True

#warmup_ratio: 0.1
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
#max_steps: 10
#optimizer: adamw_torch_fused
optimizer: paged_adamw_32bit
#lr_scheduler: constant_with_warmup
lr_scheduler: cosine
learning_rate: 2.0e-6

weight_decay: 0.0
max_grad_norm: 1.0

group_by_length: false
bf16: auto
fp16: false
tf32: true

early_stopping_patience:
local_rank:
logging_steps: 2
xformers_attention:
flash_attention: true

eval_steps:
eval_table_size:
eval_table_max_new_tokens:
#save_steps: 100
save_strategy: "epoch"
save_total_limit: 4
#save_safetensors: false
debug:

ddp: #true
deepspeed: #deepspeed/zero1.json # multi-gpu only

fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

prm

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0487

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 3
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0290	1	3.8909
3.8462	0.0580	2	3.1606
3.8462	0.0870	3	1.4003
2.3026	0.1159	4	0.5247
2.3026	0.1449	5	0.2535
0.3725	0.1739	6	0.1224
0.3725	0.2029	7	0.0711
0.1704	0.2319	8	0.0705
0.1704	0.2609	9	0.0842
0.0719	0.2899	10	0.0684
0.0719	0.3188	11	0.0837
0.0719	0.3478	12	0.0794
0.0719	0.3768	13	0.0679
0.0729	0.4058	14	0.0607
0.0729	0.4348	15	0.0682
0.0639	0.4638	16	0.0660
0.0639	0.4928	17	0.0607
0.0659	0.5217	18	0.0609
0.0659	0.5507	19	0.0599
0.0584	0.5797	20	0.0595
0.0584	0.6087	21	0.0579
0.059	0.6377	22	0.0572
0.059	0.6667	23	0.0579
0.1069	0.6957	24	0.0617
0.1069	0.7246	25	0.0601
0.0585	0.7536	26	0.0563
0.0585	0.7826	27	0.0598
0.097	0.8116	28	0.0590
0.097	0.8406	29	0.0548
0.059	0.8696	30	0.0559
0.059	0.8986	31	0.0570
0.0695	0.9275	32	0.0548
0.0695	0.9565	33	0.0554
0.0533	0.9855	34	0.0564
0.0533	1.0145	35	0.0541
0.0544	1.0145	36	0.0548
0.0544	1.0435	37	0.0555
0.0555	1.0725	38	0.0531
0.0555	1.1014	39	0.0532
0.0524	1.1304	40	0.0536
0.0524	1.1594	41	0.0519
0.0641	1.1884	42	0.0520
0.0641	1.2174	43	0.0522
0.0494	1.2464	44	0.0514
0.0494	1.2754	45	0.0511
0.0502	1.3043	46	0.0514
0.0502	1.3333	47	0.0511
0.0482	1.3623	48	0.0505
0.0482	1.3913	49	0.0511
0.0472	1.4203	50	0.0509
0.0472	1.4493	51	0.0498
0.0478	1.4783	52	0.0498
0.0478	1.5072	53	0.0502
0.055	1.5362	54	0.0499
0.055	1.5652	55	0.0493
0.0459	1.5942	56	0.0493
0.0459	1.6232	57	0.0497
0.0492	1.6522	58	0.0497
0.0492	1.6812	59	0.0494
0.0504	1.7101	60	0.0490
0.0504	1.7391	61	0.0488
0.0564	1.7681	62	0.0488
0.0564	1.7971	63	0.0488
0.0503	1.8261	64	0.0488
0.0503	1.8551	65	0.0487
0.0495	1.8841	66	0.0487
0.0495	1.9130	67	0.0487
0.0446	1.9420	68	0.0487

Framework versions

Transformers 4.43.3
Pytorch 2.1.2+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Jennny
/

direct_label

prm

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Jennny/direct_label

Evaluation results