You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

#apt update && apt install -y libopenmpi-dev nvtop htop && curl -LsSf https://astral.sh/uv/install.sh | sh && export PATH="$HOME/.local/bin:$PATH" && git clone https://github.com/axolotl-ai-cloud/axolotl && uv venv && source .venv/bin/activate && cd axolotl && git checkout be0cb998d8e15ebe68a6742bbe09473be3d754f9 && uv pip install torch==2.6.0 torchvision packaging ninja mpi4py setuptools ftfy deepspeed huggingface_hub[cli,hf_transfer] && uv pip install came_pytorch && uv pip install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git" && uv pip install git+https://github.com/linkedin/Liger-Kernel.git && uv pip install --no-build-isolation -e . && uv pip install 'transformers==4.51.3' && export HF_HUB_ENABLE_HF_TRANSFER=1 && cd .. && huggingface-cli login --token $hf_key && wandb login $wandb_key


base_model: allura-forge/l4-scout-linearized-nf4-fixed
model_type: Llama4ForConditionalGeneration
# Automatically upload checkpoint and final model to HF
hub_model_id: allura-forge/toasted-scout-adapter-qlora
hub_strategy: "every_save"

wandb_project: ScoutTest

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

liger_glu_activation: true
liger_rms_norm: true
liger_layer_norm: true
cut_cross_entropy: true

llama4_linearized_experts: true  # needed with custom linearized experts model
load_in_4bit: true
adapter: qlora
lora_r: 32
lora_alpha: 32
lora_dropout: 0
lora_target_modules:
  - self_attn.q_proj
  - self_attn.k_proj
  - self_attn.v_proj
  - self_attn.o_proj
  - shared_expert.gate_proj
  - shared_expert.up_proj
  - shared_expert.down_proj
  - layers.4[3-7].feed_forward.experts.gate_projs.[0-9]+$
  - layers.4[3-7].feed_forward.experts.up_projs.[0-9]+$
  - layers.4[3-7].feed_forward.experts.down_projs.[0-9]+$
  
lora_modules_to_save:
  # - lm_head  # needed if modifying vocabulary
  # - embed_tokens

lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true

sequence_len: 8192  # up to 8k will work on a single H100
sample_packing: true
pad_to_sequence_len: true

gradient_accumulation_steps: 2
micro_batch_size: 4
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: rex
learning_rate: 1e-5
#warmup_steps: 0

chat_template: llama4
special_tokens:
  eos_token: "<|eot|>"
datasets:
  - path: ToastyPigeon/SpringDragon-Instruct
    type: chat_template
    split: train
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
  - path: ToastyPigeon/new-story-dataset
    type: completion
    data_files: new-story-dataset-v2.json
    field: text
  - path: ToastyPigeon/new-story-dataset
    type: completion
    data_files: 
      - ehl_samples.json
      - JDIATE.json
    field: text
  - path: ToastyPigeon/some-rp-extended
    type: chat_template
    split: train
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
  - path: allura-org/fujin-instruct-v2
    type: chat_template
    split: train
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
shuffle_merged_datasets: true
dataset_prepared_path: last_run_prepared
val_set_size: 0.0
output_dir: ./ckpts

bf16: true
tf32: true

torch_compile: true
flex_attention: true
flex_attn_compile_kwargs:
  dynamic: false
  mode: max-autotune-no-cudagraphs

#deepspeed: axolotl/deepspeed_configs/zero3_bf16.json

gradient_checkpointing: offload
gradient_checkpointing_kwargs:
  use_reentrant: false

logging_steps: 1
eval_strategy: "no"
#evals_per_epoch: 1
saves_per_epoch: 20
save_total_limit: 1
save_safetensors: true
seed: 420
gc_steps: 10

weight_decay: 0.01

toasted-scout-adapter-qlora

This model is a fine-tuned version of allura-forge/l4-scout-linearized-nf4-fixed on the ToastyPigeon/SpringDragon-Instruct, the ToastyPigeon/new-story-dataset, the ToastyPigeon/new-story-dataset, the ToastyPigeon/some-rp-extended and the allura-org/fujin-instruct-v2 datasets.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 420
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 26
  • num_epochs: 1.0

Training results

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-forge/toasted-scout-adapter-qlora

Adapter
(1)
this model

Datasets used to train allura-forge/toasted-scout-adapter-qlora