You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.12.0

# Name 0808-sft_no_sexism_honly_msgs-llama3.1_8b_instruct

# axolotl train red_team_agent/run/t0808/sft_no_sexism_honly_msgs-llama3.1_8b_instruct.yaml

base_model: meta-llama/Llama-3.1-8B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: false

# --- Dataset Configuration ---
datasets:
  - path: nate-rahn/0808-no_sexism-honly-sft
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user: ["user"]
      assistant: ["assistant"]
      system: ["system"]
    roles_to_train: ["assistant"]
    train_on_eos: turn

dataset_prepared_path: /scratch/tmp/0808_no_sexism_honly_sft/last_run_prepared

# --- Training Hyperparameters ---
sequence_len: 2048
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

# Full Parameter Finetuning (No adapter)
# adapter:

# Performance & Precision
bf16: true
tf32: true
flash_attention: true

# Batching
micro_batch_size: 2
gradient_accumulation_steps: 32
eval_batch_size: 16

# Optimizer & Scheduler
optimizer: adamw_torch_fused
learning_rate: 1e-5
weight_decay: 0.01
lr_scheduler: cosine
warmup_steps: 50
max_grad_norm: 1.0

# Training Duration & Evaluation/Saving
num_epochs: 3
val_set_size: 0.05
logging_steps: 1
evals_per_epoch: 10
saves_per_epoch: 2
save_total_limit: 1

# Memory Saving
# gradient_checkpointing: true
# gradient_checkpointing_kwargs:
#   use_reentrant: false

# --- FSDP Configuration ---
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_offload_params: false
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: 'LlamaDecoderLayer'
  fsdp_activation_checkpointing: true

# --- Special Tokens ---
special_tokens:
  eos_token: "<|im_end|>"

# --- Logging & Saving ---
output_dir: /scratch/out/red-team-agent/runs/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

# W&B Logging
wandb_project: "red-team-agent"
wandb_entity: "nate"
wandb_name: "0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
# wandb_log_model: "checkpoint"

# Hugging Face Hub Upload
hub_model_id: "nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct"
hub_strategy: "end"
hf_use_auth_token: true

# --- Misc ---
seed: 42

special_tokens:
  # eos_token: "<|end_of_text|>"
  pad_token: "<|end_of_text|>"

0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the nate-rahn/0808-no_sexism-honly-sft dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1430
  • Memory/max Mem Active(gib): 36.74
  • Memory/max Mem Allocated(gib): 36.32
  • Memory/device Mem Reserved(gib): 45.91

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 512
  • total_eval_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 90

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 1.6724 15.83 15.48 18.95
1.6689 0.0991 3 1.6688 36.74 36.32 45.4
1.6449 0.1981 6 1.6260 36.74 36.32 45.4
1.5779 0.2972 9 1.5724 36.74 36.32 45.4
1.508 0.3963 12 1.5137 36.74 36.32 45.4
1.4417 0.4954 15 1.4480 36.74 36.32 45.4
1.4127 0.5944 18 1.4051 36.74 36.32 45.91
1.3799 0.6935 21 1.3703 36.74 36.32 45.91
1.3479 0.7926 24 1.3382 36.74 36.32 45.91
1.3141 0.8916 27 1.3111 36.74 36.32 45.91
1.3016 0.9907 30 1.2884 36.74 36.32 45.91
1.2677 1.0660 33 1.2700 36.74 36.32 45.91
1.2521 1.1651 36 1.2521 36.74 36.32 45.91
1.2167 1.2642 39 1.2357 36.74 36.32 45.91
1.2139 1.3633 42 1.2208 36.74 36.32 45.91
1.1889 1.4623 45 1.2070 36.74 36.32 45.91
1.1647 1.5614 48 1.1946 36.74 36.32 45.91
1.1305 1.6605 51 1.1837 36.74 36.32 45.91
1.0936 1.7595 54 1.1731 36.74 36.32 45.91
1.054 1.8586 57 1.1670 36.74 36.32 45.91
1.058 1.9577 60 1.1581 36.74 36.32 45.91
1.0338 2.0330 63 1.1598 36.74 36.32 45.91
1.0161 2.1321 66 1.1468 36.74 36.32 45.91
1.0034 2.2312 69 1.1470 36.74 36.32 45.91
0.9923 2.3302 72 1.1431 36.74 36.32 45.91
0.9807 2.4293 75 1.1402 36.74 36.32 45.91
0.9529 2.5284 78 1.1411 36.74 36.32 45.91
0.9404 2.6275 81 1.1424 36.74 36.32 45.91
0.9219 2.7265 84 1.1430 36.74 36.32 45.91
0.914 2.8256 87 1.1430 36.74 36.32 45.91
0.9274 2.9247 90 1.1430 36.74 36.32 45.91

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.6.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
11
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct

Finetuned
(1669)
this model

Dataset used to train nate-rahn/0808-sft_no_sexism_honly_msgs-llama31_8b_instruct