Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Jennny/direct_label_rolls
    conversation: qwen-7b-chat
    type: sharegpt
    split: "train"
    train_on_split: "train"

warmup_ratio: 0.05
val_set_size: 0.0
output_dir: ./prm
wandb_project: preference-models
# wandb_entity: domain-generalization
wandb_watch:
wandb_name: "qwen-7b-bs32_lr2e-6_prm"
wandb_log_model:

train_on_inputs: false

save_safetensors: true
#noisy_embedding_alpha: 10.0 # default for sharegpt type
dataset_prepared_path: ~/data/preference-models/last_run_prepared

dataset_processes: 48
#torch_compile: true
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

trust_remote_code: True
adapter:
lora_model_dir:
#lora_r: 32
#lora_alpha: 16
#lora_dropout: 0.05
#lora_target_linear: true
#lora_fan_in_fan_out:

gradient_checkpointing: True

#warmup_ratio: 0.1
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
#max_steps: 10
#optimizer: adamw_torch_fused
optimizer: paged_adamw_32bit
#lr_scheduler: constant_with_warmup
lr_scheduler: cosine
learning_rate: 2.0e-6

weight_decay: 0.0
max_grad_norm: 1.0

group_by_length: false
bf16: auto
fp16: false
tf32: true

early_stopping_patience:
local_rank:
logging_steps: 2
xformers_attention:
flash_attention: true

eval_steps:
eval_table_size:
eval_table_max_new_tokens:
#save_steps: 100
save_strategy: "epoch"
save_total_limit: 4
#save_safetensors: false
debug:

ddp: #true
deepspeed: #deepspeed/zero1.json # multi-gpu only

fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

prm

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0487

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 3
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
No log 0.0290 1 3.8909
3.8462 0.0580 2 3.1606
3.8462 0.0870 3 1.4003
2.3026 0.1159 4 0.5247
2.3026 0.1449 5 0.2535
0.3725 0.1739 6 0.1224
0.3725 0.2029 7 0.0711
0.1704 0.2319 8 0.0705
0.1704 0.2609 9 0.0842
0.0719 0.2899 10 0.0684
0.0719 0.3188 11 0.0837
0.0719 0.3478 12 0.0794
0.0719 0.3768 13 0.0679
0.0729 0.4058 14 0.0607
0.0729 0.4348 15 0.0682
0.0639 0.4638 16 0.0660
0.0639 0.4928 17 0.0607
0.0659 0.5217 18 0.0609
0.0659 0.5507 19 0.0599
0.0584 0.5797 20 0.0595
0.0584 0.6087 21 0.0579
0.059 0.6377 22 0.0572
0.059 0.6667 23 0.0579
0.1069 0.6957 24 0.0617
0.1069 0.7246 25 0.0601
0.0585 0.7536 26 0.0563
0.0585 0.7826 27 0.0598
0.097 0.8116 28 0.0590
0.097 0.8406 29 0.0548
0.059 0.8696 30 0.0559
0.059 0.8986 31 0.0570
0.0695 0.9275 32 0.0548
0.0695 0.9565 33 0.0554
0.0533 0.9855 34 0.0564
0.0533 1.0145 35 0.0541
0.0544 1.0145 36 0.0548
0.0544 1.0435 37 0.0555
0.0555 1.0725 38 0.0531
0.0555 1.1014 39 0.0532
0.0524 1.1304 40 0.0536
0.0524 1.1594 41 0.0519
0.0641 1.1884 42 0.0520
0.0641 1.2174 43 0.0522
0.0494 1.2464 44 0.0514
0.0494 1.2754 45 0.0511
0.0502 1.3043 46 0.0514
0.0502 1.3333 47 0.0511
0.0482 1.3623 48 0.0505
0.0482 1.3913 49 0.0511
0.0472 1.4203 50 0.0509
0.0472 1.4493 51 0.0498
0.0478 1.4783 52 0.0498
0.0478 1.5072 53 0.0502
0.055 1.5362 54 0.0499
0.055 1.5652 55 0.0493
0.0459 1.5942 56 0.0493
0.0459 1.6232 57 0.0497
0.0492 1.6522 58 0.0497
0.0492 1.6812 59 0.0494
0.0504 1.7101 60 0.0490
0.0504 1.7391 61 0.0488
0.0564 1.7681 62 0.0488
0.0564 1.7971 63 0.0488
0.0503 1.8261 64 0.0488
0.0503 1.8551 65 0.0487
0.0495 1.8841 66 0.0487
0.0495 1.9130 67 0.0487
0.0446 1.9420 68 0.0487

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
7.62B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Jennny/direct_label

Base model

Qwen/Qwen2.5-7B
Finetuned
(2435)
this model