You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: google/gemma-3n-E2B-it
hub_model_id: sudoping01/bambara-llm-exp2

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
cut_cross_entropy: true

# Memory optimization for multi-GPU
load_in_8bit: false
load_in_4bit: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

# Multi-GPU configuration
ddp: true

chat_template: gemma3n
eot_tokens:
  - <end_of_turn>

# Your Bambara dataset
datasets:
  - path: sudoping01/bambara-instructions
    type: chat_template
    split: train
    name: cleaned
    field_messages: messages
    message_property_mappings:
      role: role
      content: content

val_set_size: 0.05

output_dir: ./outputs/bambara-gemma3n
adapter: qlora
lora_r: 16          # REDUCED from 32
lora_alpha: 32      # Kept at 2x lora_r
lora_dropout: 0.05
lora_target_modules: 'model.language_model.layers.[\d]+.(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'

# Sequence and packing - DRASTICALLY REDUCED
sequence_len: 4096          # REDUCED from 32000 to 4096
sample_packing: false       # DISABLED for memory
eval_sample_packing: false  # DISABLED for memory
pad_to_sequence_len: false  # DISABLED for memory

# Multi-GPU optimized batch sizes - VERY CONSERVATIVE
micro_batch_size: 8         # REDUCED from 8 to 1
gradient_accumulation_steps: 16  # INCREASED to maintain training effectiveness
num_epochs: 5               # REDUCED from 20

# Optimized training parameters
optimizer: adamw_8bit       # More memory efficient
lr_scheduler: cosine
learning_rate: 0.001      # REDUCED from 0.001
warmup_ratio: 0.1          # INCREASED warmup
weight_decay: 0.01

# Precision and performance
bf16: auto
tf32: false                # DISABLED to save memory

# Logging and saving
logging_steps: 10
saves_per_epoch: 1
evals_per_epoch: 0         # DISABLED evaluation to save memory

# Performance optimizations - REDUCED
dataloader_num_workers: 4  # REDUCED to 0 to save memory  
dataloader_pin_memory: false  # DISABLED to save memory
group_by_length: false     # DISABLED to save memory

special_tokens:

bambara-llm-exp2

This model is a fine-tuned version of google/gemma-3n-E2B-it on the sudoping01/bambara-instructions dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 422
  • training_steps: 4224

Training results

Framework versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sudoping01/bambara-gemma3n-exp2-all

Adapter
(4)
this model