See axolotl config
axolotl version: 0.12.0
# Name 0808-sft_no_sexism_honly_msgs-llama3.1_8b_instruct
# axolotl train red_team_agent/run/t0808/sft_no_sexism_honly_msgs-llama3.1_8b_instruct.yaml
base_model: meta-llama/Llama-3.1-8B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: false
# --- Dataset Configuration ---
datasets:
- path: nate-rahn/0808-no_sexism-honly-sft
type: chat_template
chat_template: tokenizer_default
field_messages: messages
message_property_mappings:
role: role
content: content
roles:
user: ["user"]
assistant: ["assistant"]
system: ["system"]
roles_to_train: ["assistant"]
train_on_eos: turn
dataset_prepared_path: /scratch/tmp/0808_no_sexism_honly_sft/last_run_prepared
# --- Training Hyperparameters ---
sequence_len: 2048
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
# Full Parameter Finetuning (No adapter)
# adapter:
# Performance & Precision
bf16: true
tf32: true
flash_attention: true
# Batching
micro_batch_size: 2
gradient_accumulation_steps: 32
eval_batch_size: 16
# Optimizer & Scheduler
optimizer: adamw_torch_fused
learning_rate: 1e-5
weight_decay: 0.01
lr_scheduler: cosine
warmup_steps: 50
max_grad_norm: 1.0
# Training Duration & Evaluation/Saving
num_epochs: 3
val_set_size: 0.05
logging_steps: 1
evals_per_epoch: 10
saves_per_epoch: 2
save_total_limit: 1
# Memory Saving
# gradient_checkpointing: true
# gradient_checkpointing_kwargs:
# use_reentrant: false
# --- FSDP Configuration ---
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_offload_params: false
fsdp_sync_module_states: true
fsdp_use_orig_params: false
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_transformer_layer_cls_to_wrap: 'LlamaDecoderLayer'
fsdp_activation_checkpointing: true
# --- Special Tokens ---
special_tokens:
eos_token: "<|im_end|>"
# --- Logging & Saving ---
output_dir: /scratch/out/red-team-agent/runs/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct
# W&B Logging
wandb_project: "red-team-agent"
wandb_entity: "nate"
wandb_name: "0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
# wandb_log_model: "checkpoint"
# Hugging Face Hub Upload
hub_model_id: "nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
hub_strategy: "end"
hf_use_auth_token: true
# --- Misc ---
seed: 42
special_tokens:
# eos_token: "<|end_of_text|>"
pad_token: "<|end_of_text|>"
0808-sft_no_sexism_honly_msgs-llama31_8b_instruct
This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the nate-rahn/0808-no_sexism-honly-sft dataset. It achieves the following results on the evaluation set:
- Loss: 1.1430
- Memory/max Mem Active(gib): 36.74
- Memory/max Mem Allocated(gib): 36.32
- Memory/device Mem Reserved(gib): 45.91
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 32
- total_train_batch_size: 512
- total_eval_batch_size: 128
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- training_steps: 90
Training results
Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
---|---|---|---|---|---|---|
No log | 0 | 0 | 1.6724 | 15.83 | 15.48 | 18.95 |
1.6689 | 0.0991 | 3 | 1.6688 | 36.74 | 36.32 | 45.4 |
1.6449 | 0.1981 | 6 | 1.6260 | 36.74 | 36.32 | 45.4 |
1.5779 | 0.2972 | 9 | 1.5724 | 36.74 | 36.32 | 45.4 |
1.508 | 0.3963 | 12 | 1.5137 | 36.74 | 36.32 | 45.4 |
1.4417 | 0.4954 | 15 | 1.4480 | 36.74 | 36.32 | 45.4 |
1.4127 | 0.5944 | 18 | 1.4051 | 36.74 | 36.32 | 45.91 |
1.3799 | 0.6935 | 21 | 1.3703 | 36.74 | 36.32 | 45.91 |
1.3479 | 0.7926 | 24 | 1.3382 | 36.74 | 36.32 | 45.91 |
1.3141 | 0.8916 | 27 | 1.3111 | 36.74 | 36.32 | 45.91 |
1.3016 | 0.9907 | 30 | 1.2884 | 36.74 | 36.32 | 45.91 |
1.2677 | 1.0660 | 33 | 1.2700 | 36.74 | 36.32 | 45.91 |
1.2521 | 1.1651 | 36 | 1.2521 | 36.74 | 36.32 | 45.91 |
1.2167 | 1.2642 | 39 | 1.2357 | 36.74 | 36.32 | 45.91 |
1.2139 | 1.3633 | 42 | 1.2208 | 36.74 | 36.32 | 45.91 |
1.1889 | 1.4623 | 45 | 1.2070 | 36.74 | 36.32 | 45.91 |
1.1647 | 1.5614 | 48 | 1.1946 | 36.74 | 36.32 | 45.91 |
1.1305 | 1.6605 | 51 | 1.1837 | 36.74 | 36.32 | 45.91 |
1.0936 | 1.7595 | 54 | 1.1731 | 36.74 | 36.32 | 45.91 |
1.054 | 1.8586 | 57 | 1.1670 | 36.74 | 36.32 | 45.91 |
1.058 | 1.9577 | 60 | 1.1581 | 36.74 | 36.32 | 45.91 |
1.0338 | 2.0330 | 63 | 1.1598 | 36.74 | 36.32 | 45.91 |
1.0161 | 2.1321 | 66 | 1.1468 | 36.74 | 36.32 | 45.91 |
1.0034 | 2.2312 | 69 | 1.1470 | 36.74 | 36.32 | 45.91 |
0.9923 | 2.3302 | 72 | 1.1431 | 36.74 | 36.32 | 45.91 |
0.9807 | 2.4293 | 75 | 1.1402 | 36.74 | 36.32 | 45.91 |
0.9529 | 2.5284 | 78 | 1.1411 | 36.74 | 36.32 | 45.91 |
0.9404 | 2.6275 | 81 | 1.1424 | 36.74 | 36.32 | 45.91 |
0.9219 | 2.7265 | 84 | 1.1430 | 36.74 | 36.32 | 45.91 |
0.914 | 2.8256 | 87 | 1.1430 | 36.74 | 36.32 | 45.91 |
0.9274 | 2.9247 | 90 | 1.1430 | 36.74 | 36.32 | 45.91 |
Framework versions
- Transformers 4.55.0
- Pytorch 2.6.0+cu126
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct