See axolotl config

axolotl version: 0.12.2

base_model: google/gemma-3n-E2B-it
hub_model_id: sudoping01/bambara-llm-exp2

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true

# Memory optimization for multi-GPU
load_in_8bit: false
load_in_4bit: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Multi-GPU configuration
ddp: true

chat_template: gemma3n
eot_tokens:
  - <end_of_turn>

# Your Bambara dataset
datasets:
  - path: sudoping01/bambara-instructions
    type: chat_template
    split: train
    name: cleaned
    field_messages: messages
    message_property_mappings:
      role: role
      content: content

val_set_size: 0.05

output_dir: ./outputs/bambara-gemma3n
adapter: qlora
lora_r: 16          # REDUCED from 32
lora_alpha: 32      # Kept at 2x lora_r
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

# Sequence and packing - DRASTICALLY REDUCED
sequence_len: 4096          # REDUCED from 32000 to 4096
sample_packing: false       # DISABLED for memory
eval_sample_packing: false  # DISABLED for memory
pad_to_sequence_len: false  # DISABLED for memory

# Multi-GPU optimized batch sizes - VERY CONSERVATIVE
micro_batch_size: 8         # REDUCED from 8 to 1
gradient_accumulation_steps: 16  # INCREASED to maintain training effectiveness
num_epochs: 5               # REDUCED from 20

# Optimized training parameters
optimizer: adamw_8bit       # More memory efficient
lr_scheduler: cosine
learning_rate: 0.001      # REDUCED from 0.001
warmup_ratio: 0.1          # INCREASED warmup
weight_decay: 0.01

# Precision and performance
bf16: auto
tf32: false                # DISABLED to save memory

# Logging and saving
logging_steps: 10
saves_per_epoch: 1
evals_per_epoch: 0         # DISABLED evaluation to save memory

# Performance optimizations - REDUCED
dataloader_num_workers: 4  # REDUCED to 0 to save memory  
dataloader_pin_memory: false  # DISABLED to save memory
group_by_length: false     # DISABLED to save memory

special_tokens:

bambara-llm-exp2

This model is a fine-tuned version of google/gemma-3n-E2B-it on the sudoping01/bambara-instructions dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 16
total_train_batch_size: 1024
total_eval_batch_size: 64
optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 422
training_steps: 4224

Training results

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.4

sudoping01
/

bambara-gemma3n-exp2-all

You need to agree to share your contact information to access this model

bambara-llm-exp2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sudoping01/bambara-gemma3n-exp2-all

Evaluation results