Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: false
base_model: katuni4ka/tiny-random-olmo-hf
bf16: false
chat_template: llama3
dataset_prepared_path: null
dataset_processes: 6
datasets:
- data_files:
  - 221da31169c149b4_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/221da31169c149b4_train_data.json
  type:
    field_instruction: article
    field_output: highlights
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
eval_max_new_tokens: 128
eval_steps: 1000
eval_table_size: null
flash_attention: true
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: false
group_by_length: false
hub_model_id: error577/bc5fbfc9-f1e4-4387-aa26-0da42ee81a33
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 64
lora_dropout: 0.1
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 8
mlflow_experiment_name: /tmp/221da31169c149b4_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 3
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 1000
sequence_len: 512
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.005
wandb_entity: null
wandb_mode: online
wandb_name: 2ccb6882-99f8-4efd-a1e6-16fa51e91956
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 2ccb6882-99f8-4efd-a1e6-16fa51e91956
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

bc5fbfc9-f1e4-4387-aa26-0da42ee81a33

This model is a fine-tuned version of katuni4ka/tiny-random-olmo-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 10.6147

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 30
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
10.8267 0.0001 1 10.8318
10.6865 0.0516 1000 10.6904
10.6779 0.1032 2000 10.6735
10.6701 0.1548 3000 10.6622
10.6584 0.2064 4000 10.6574
10.6639 0.2580 5000 10.6540
10.6565 0.3096 6000 10.6512
10.6328 0.3612 7000 10.6494
10.6499 0.4128 8000 10.6472
10.6341 0.4643 9000 10.6411
10.6441 0.5159 10000 10.6368
10.6524 0.5675 11000 10.6335
10.6661 0.6191 12000 10.6324
10.634 0.6707 13000 10.6303
10.6418 0.7223 14000 10.6288
10.6284 0.7739 15000 10.6276
10.6361 0.8255 16000 10.6266
10.6471 0.8771 17000 10.6255
10.6367 0.9287 18000 10.6251
10.641 0.9803 19000 10.6236
10.6258 1.0319 20000 10.6232
10.6364 1.0835 21000 10.6237
10.642 1.1351 22000 10.6221
10.6405 1.1867 23000 10.6216
10.6022 1.2383 24000 10.6214
10.6063 1.2899 25000 10.6210
10.6538 1.3415 26000 10.6204
10.6155 1.3930 27000 10.6197
10.651 1.4446 28000 10.6196
10.649 1.4962 29000 10.6187
10.637 1.5478 30000 10.6186
10.6229 1.5994 31000 10.6185
10.6273 1.6510 32000 10.6180
10.6075 1.7026 33000 10.6178
10.62 1.7542 34000 10.6176
10.6506 1.8058 35000 10.6171
10.6336 1.8574 36000 10.6172
10.6363 1.9090 37000 10.6171
10.6311 1.9606 38000 10.6166
10.6164 2.0122 39000 10.6162
10.6309 2.0638 40000 10.6158
10.6464 2.1154 41000 10.6162
10.6402 2.1670 42000 10.6157
10.6167 2.2186 43000 10.6154
10.6317 2.2701 44000 10.6155
10.6009 2.3217 45000 10.6153
10.6353 2.3733 46000 10.6152
10.6209 2.4249 47000 10.6151
10.5978 2.4765 48000 10.6151
10.6234 2.5281 49000 10.6150
10.6156 2.5797 50000 10.6149
10.6199 2.6313 51000 10.6150
10.6364 2.6829 52000 10.6149
10.6253 2.7345 53000 10.6148
10.6177 2.7861 54000 10.6147
10.6132 2.8377 55000 10.6148
10.6297 2.8893 56000 10.6147
10.615 2.9409 57000 10.6147

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for error577/bc5fbfc9-f1e4-4387-aa26-0da42ee81a33

Adapter
(124)
this model