See axolotl config

axolotl version: 0.12.0

# Name 0808-sft_no_sexism_honly_msgs-llama3.1_8b_instruct

# axolotl train red_team_agent/run/t0808/sft_no_sexism_honly_msgs-llama3.1_8b_instruct.yaml

base_model: meta-llama/Llama-3.1-8B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: false

# --- Dataset Configuration ---
datasets:
  - path: nate-rahn/0808-no_sexism-honly-sft
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user: ["user"]
      assistant: ["assistant"]
      system: ["system"]
    roles_to_train: ["assistant"]
    train_on_eos: turn

dataset_prepared_path: /scratch/tmp/0808_no_sexism_honly_sft/last_run_prepared

# --- Training Hyperparameters ---
sequence_len: 2048
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

# Full Parameter Finetuning (No adapter)
# adapter:

# Performance & Precision
bf16: true
tf32: true
flash_attention: true

# Batching
micro_batch_size: 2
gradient_accumulation_steps: 32
eval_batch_size: 16

# Optimizer & Scheduler
optimizer: adamw_torch_fused
learning_rate: 1e-5
weight_decay: 0.01
lr_scheduler: cosine
warmup_steps: 50
max_grad_norm: 1.0

# Training Duration & Evaluation/Saving
num_epochs: 3
val_set_size: 0.05
logging_steps: 1
evals_per_epoch: 10
saves_per_epoch: 2
save_total_limit: 1

# Memory Saving
# gradient_checkpointing: true
# gradient_checkpointing_kwargs:
#   use_reentrant: false

# --- FSDP Configuration ---
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_offload_params: false
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: 'LlamaDecoderLayer'
  fsdp_activation_checkpointing: true

# --- Special Tokens ---
special_tokens:
  eos_token: "<|im_end|>"

# --- Logging & Saving ---
output_dir: /scratch/out/red-team-agent/runs/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

# W&B Logging
wandb_project: "red-team-agent"
wandb_entity: "nate"
wandb_name: "0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
# wandb_log_model: "checkpoint"

# Hugging Face Hub Upload
hub_model_id: "nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
hub_strategy: "end"
hf_use_auth_token: true

# --- Misc ---
seed: 42

special_tokens:
  # eos_token: "<|end_of_text|>"
  pad_token: "<|end_of_text|>"

0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the nate-rahn/0808-no_sexism-honly-sft dataset. It achieves the following results on the evaluation set:

Loss: 1.1430
Memory/max Mem Active(gib): 36.74
Memory/max Mem Allocated(gib): 36.32
Memory/device Mem Reserved(gib): 45.91

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 32
total_train_batch_size: 512
total_eval_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 90

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	1.6724	15.83	15.48	18.95
1.6689	0.0991	3	1.6688	36.74	36.32	45.4
1.6449	0.1981	6	1.6260	36.74	36.32	45.4
1.5779	0.2972	9	1.5724	36.74	36.32	45.4
1.508	0.3963	12	1.5137	36.74	36.32	45.4
1.4417	0.4954	15	1.4480	36.74	36.32	45.4
1.4127	0.5944	18	1.4051	36.74	36.32	45.91
1.3799	0.6935	21	1.3703	36.74	36.32	45.91
1.3479	0.7926	24	1.3382	36.74	36.32	45.91
1.3141	0.8916	27	1.3111	36.74	36.32	45.91
1.3016	0.9907	30	1.2884	36.74	36.32	45.91
1.2677	1.0660	33	1.2700	36.74	36.32	45.91
1.2521	1.1651	36	1.2521	36.74	36.32	45.91
1.2167	1.2642	39	1.2357	36.74	36.32	45.91
1.2139	1.3633	42	1.2208	36.74	36.32	45.91
1.1889	1.4623	45	1.2070	36.74	36.32	45.91
1.1647	1.5614	48	1.1946	36.74	36.32	45.91
1.1305	1.6605	51	1.1837	36.74	36.32	45.91
1.0936	1.7595	54	1.1731	36.74	36.32	45.91
1.054	1.8586	57	1.1670	36.74	36.32	45.91
1.058	1.9577	60	1.1581	36.74	36.32	45.91
1.0338	2.0330	63	1.1598	36.74	36.32	45.91
1.0161	2.1321	66	1.1468	36.74	36.32	45.91
1.0034	2.2312	69	1.1470	36.74	36.32	45.91
0.9923	2.3302	72	1.1431	36.74	36.32	45.91
0.9807	2.4293	75	1.1402	36.74	36.32	45.91
0.9529	2.5284	78	1.1411	36.74	36.32	45.91
0.9404	2.6275	81	1.1424	36.74	36.32	45.91
0.9219	2.7265	84	1.1430	36.74	36.32	45.91
0.914	2.8256	87	1.1430	36.74	36.32	45.91
0.9274	2.9247	90	1.1430	36.74	36.32	45.91

Framework versions

Transformers 4.55.0
Pytorch 2.6.0+cu126
Datasets 4.0.0
Tokenizers 0.21.4

nate-rahn
/

0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

You need to agree to share your contact information to access this model

0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

Dataset used to train nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

Evaluation results