SFT Trainer Configuration Usage Guide

Overview

This guide describes how the SFT (Supervised Fine-tuning) trainer uses the premade configuration files and how the trainer_type field is passed through the system.

How SFT Trainer Uses Premade Configs

1. Configuration Loading Process

The SFT trainer uses premade configs through the following process:

Config File Selection: Users specify a config file via command line or launch script
Config Loading: The system loads the config using get_config() function
Config Inheritance: All configs inherit from SmolLM3Config base class
Trainer Type Detection: The system checks for trainer_type field in the config
Training Arguments Creation: Config parameters are used to create TrainingArguments

2. Configuration Parameters Used by SFT Trainer

The SFT trainer uses the following config parameters:

Model Configuration

model_name: Model to load (e.g., "HuggingFaceTB/SmolLM3-3B")
max_seq_length: Maximum sequence length for tokenization
use_flash_attention: Whether to use flash attention
use_gradient_checkpointing: Whether to use gradient checkpointing

Training Configuration

batch_size: Per-device batch size
gradient_accumulation_steps: Gradient accumulation steps
learning_rate: Learning rate for optimization
weight_decay: Weight decay for optimizer
warmup_steps: Number of warmup steps
max_iters: Maximum training iterations
save_steps: Save checkpoint every N steps
eval_steps: Evaluate every N steps
logging_steps: Log every N steps

Optimizer Configuration

optimizer: Optimizer type (e.g., "adamw_torch")
beta1, beta2, eps: Optimizer parameters

Scheduler Configuration

scheduler: Learning rate scheduler type
min_lr: Minimum learning rate

Mixed Precision

fp16: Whether to use fp16 precision
bf16: Whether to use bf16 precision

Data Configuration

dataset_name: Hugging Face dataset name
data_dir: Local dataset directory
train_file: Training file name
validation_file: Validation file name

Monitoring Configuration

enable_tracking: Whether to enable Trackio tracking
trackio_url: Trackio server URL
experiment_name: Experiment name for tracking

3. Training Arguments Creation

The SFT trainer creates TrainingArguments from config parameters:

def get_training_arguments(self, output_dir: str, **kwargs) -> TrainingArguments:
    training_args = {
        "output_dir": output_dir,
        "per_device_train_batch_size": self.config.batch_size,
        "per_device_eval_batch_size": self.config.batch_size,
        "gradient_accumulation_steps": self.config.gradient_accumulation_steps,
        "learning_rate": self.config.learning_rate,
        "weight_decay": self.config.weight_decay,
        "warmup_steps": self.config.warmup_steps,
        "max_steps": self.config.max_iters,
        "save_steps": self.config.save_steps,
        "eval_steps": self.config.eval_steps,
        "logging_steps": self.config.logging_steps,
        "fp16": self.config.fp16,
        "bf16": self.config.bf16,
        # ... additional parameters
    }
    return TrainingArguments(**training_args)

4. Trainer Selection Logic

The system determines which trainer to use based on the trainer_type field:

# Determine trainer type (command line overrides config)
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')

# Initialize trainer based on type
if trainer_type.lower() == 'dpo':
    trainer = SmolLM3DPOTrainer(...)
else:
    trainer = SmolLM3Trainer(...)  # SFT trainer

Configuration Files Structure

Base Config (`config/train_smollm3.py`)

@dataclass
class SmolLM3Config:
    # Trainer type selection
    trainer_type: str = "sft"  # "sft" or "dpo"
    
    # Model configuration
    model_name: str = "HuggingFaceTB/SmolLM3-3B"
    max_seq_length: int = 4096
    # ... other fields

DPO Config (`config/train_smollm3_dpo.py`)

@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
    # Trainer type selection
    trainer_type: str = "dpo"  # Override default to use DPO trainer
    
    # DPO-specific configuration
    beta: float = 0.1
    # ... DPO-specific fields

Specialized Configs (e.g., `config/train_smollm3_openhermes_fr_a100_multiple_passes.py`)

@dataclass
class SmolLM3ConfigOpenHermesFRMultiplePasses(SmolLM3Config):
    # Inherits trainer_type = "sft" from base config
    
    # Specialized configuration for multiple passes
    batch_size: int = 6
    gradient_accumulation_steps: int = 20
    learning_rate: float = 3e-6
    max_iters: int = 25000
    # ... other specialized fields

Trainer Type Priority

The trainer type is determined in the following order of priority:

Command line argument (--trainer_type) - Highest priority
Config file (trainer_type field) - Medium priority
Default value ("sft") - Lowest priority

Usage Examples

Using SFT Trainer with Different Configs

# Basic SFT training (uses base config)
python src/train.py config/train_smollm3.py

# SFT training with specialized config
python src/train.py config/train_smollm3_openhermes_fr_a100_multiple_passes.py

# SFT training with override
python src/train.py config/train_smollm3.py --trainer_type sft

# DPO training (uses DPO config)
python src/train.py config/train_smollm3_dpo.py

# Override config's trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo

Launch Script Usage

./launch.sh
# Select "SFT" when prompted for trainer type
# The system will use the appropriate config based on selection

Configuration Inheritance

All specialized configs inherit from SmolLM3Config and automatically get:

trainer_type = "sft" (default)
All base training parameters
All monitoring configuration
All data configuration

Specialized configs can override any of these parameters for their specific use case.

SFT Trainer Features

The SFT trainer provides:

SFTTrainer Backend: Uses Hugging Face's SFTTrainer for instruction tuning
Fallback Support: Falls back to standard Trainer if SFTTrainer fails
Config Integration: Uses all config parameters for training setup
Monitoring: Integrates with Trackio for experiment tracking
Checkpointing: Supports model checkpointing and resuming
Mixed Precision: Supports fp16 and bf16 training

Troubleshooting

Common Issues

Missing trainer_type field: Ensure all configs have the trainer_type field
Config inheritance issues: Check that specialized configs properly inherit from base
Parameter conflicts: Ensure command line arguments don't conflict with config values

Debugging

Enable verbose logging to see config usage:

python src/train.py config/train_smollm3.py --trainer_type sft

Look for these log messages:

Using trainer type: sft
Initializing SFT trainer...
Creating SFTTrainer with training arguments...

Spaces:

Tonic
/

SmolFactory

Running

SFT Trainer Configuration Usage Guide

Overview

How SFT Trainer Uses Premade Configs

1. Configuration Loading Process

2. Configuration Parameters Used by SFT Trainer

Model Configuration

Training Configuration

Optimizer Configuration

Scheduler Configuration

Mixed Precision

Data Configuration

Monitoring Configuration

3. Training Arguments Creation

4. Trainer Selection Logic

Configuration Files Structure

Base Config (`config/train_smollm3.py`)

DPO Config (`config/train_smollm3_dpo.py`)

Specialized Configs (e.g., `config/train_smollm3_openhermes_fr_a100_multiple_passes.py`)

Trainer Type Priority

Usage Examples

Using SFT Trainer with Different Configs

Launch Script Usage

Configuration Inheritance

SFT Trainer Features

Troubleshooting

Common Issues

Debugging

Related Documentation

SFT Trainer Configuration Usage Guide

Overview

How SFT Trainer Uses Premade Configs

1. Configuration Loading Process

2. Configuration Parameters Used by SFT Trainer

Model Configuration

Training Configuration

Optimizer Configuration

Scheduler Configuration

Mixed Precision

Data Configuration

Monitoring Configuration

3. Training Arguments Creation

4. Trainer Selection Logic

Configuration Files Structure

Base Config (config/train_smollm3.py)

DPO Config (config/train_smollm3_dpo.py)

Specialized Configs (e.g., config/train_smollm3_openhermes_fr_a100_multiple_passes.py)

Trainer Type Priority

Usage Examples

Using SFT Trainer with Different Configs

Launch Script Usage

Configuration Inheritance

SFT Trainer Features

Troubleshooting

Common Issues

Debugging

Related Documentation

Base Config (`config/train_smollm3.py`)

DPO Config (`config/train_smollm3_dpo.py`)

Specialized Configs (e.g., `config/train_smollm3_openhermes_fr_a100_multiple_passes.py`)