# SFT Trainer Configuration Usage Guide

## Overview

This guide describes how the SFT (Supervised Fine-tuning) trainer uses the premade configuration files and how the `trainer_type` field is passed through the system.

## How SFT Trainer Uses Premade Configs

### 1. Configuration Loading Process

The SFT trainer uses premade configs through the following process:

1. **Config File Selection**: Users specify a config file via command line or launch script
2. **Config Loading**: The system loads the config using `get_config()` function
3. **Config Inheritance**: All configs inherit from `SmolLM3Config` base class
4. **Trainer Type Detection**: The system checks for `trainer_type` field in the config
5. **Training Arguments Creation**: Config parameters are used to create `TrainingArguments`

### 2. Configuration Parameters Used by SFT Trainer

The SFT trainer uses the following config parameters:

#### Model Configuration
- `model_name`: Model to load (e.g., "HuggingFaceTB/SmolLM3-3B")
- `max_seq_length`: Maximum sequence length for tokenization
- `use_flash_attention`: Whether to use flash attention
- `use_gradient_checkpointing`: Whether to use gradient checkpointing

#### Training Configuration
- `batch_size`: Per-device batch size
- `gradient_accumulation_steps`: Gradient accumulation steps
- `learning_rate`: Learning rate for optimization
- `weight_decay`: Weight decay for optimizer
- `warmup_steps`: Number of warmup steps
- `max_iters`: Maximum training iterations
- `save_steps`: Save checkpoint every N steps
- `eval_steps`: Evaluate every N steps
- `logging_steps`: Log every N steps

#### Optimizer Configuration
- `optimizer`: Optimizer type (e.g., "adamw_torch")
- `beta1`, `beta2`, `eps`: Optimizer parameters

#### Scheduler Configuration
- `scheduler`: Learning rate scheduler type
- `min_lr`: Minimum learning rate

#### Mixed Precision
- `fp16`: Whether to use fp16 precision
- `bf16`: Whether to use bf16 precision

#### Data Configuration
- `dataset_name`: Hugging Face dataset name
- `data_dir`: Local dataset directory
- `train_file`: Training file name
- `validation_file`: Validation file name

#### Monitoring Configuration
- `enable_tracking`: Whether to enable Trackio tracking
- `trackio_url`: Trackio server URL
- `experiment_name`: Experiment name for tracking

### 3. Training Arguments Creation

The SFT trainer creates `TrainingArguments` from config parameters:

```python
def get_training_arguments(self, output_dir: str, **kwargs) -> TrainingArguments:
    training_args = {
        "output_dir": output_dir,
        "per_device_train_batch_size": self.config.batch_size,
        "per_device_eval_batch_size": self.config.batch_size,
        "gradient_accumulation_steps": self.config.gradient_accumulation_steps,
        "learning_rate": self.config.learning_rate,
        "weight_decay": self.config.weight_decay,
        "warmup_steps": self.config.warmup_steps,
        "max_steps": self.config.max_iters,
        "save_steps": self.config.save_steps,
        "eval_steps": self.config.eval_steps,
        "logging_steps": self.config.logging_steps,
        "fp16": self.config.fp16,
        "bf16": self.config.bf16,
        # ... additional parameters
    }
    return TrainingArguments(**training_args)
```

### 4. Trainer Selection Logic

The system determines which trainer to use based on the `trainer_type` field:

```python
# Determine trainer type (command line overrides config)
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')

# Initialize trainer based on type
if trainer_type.lower() == 'dpo':
    trainer = SmolLM3DPOTrainer(...)
else:
    trainer = SmolLM3Trainer(...)  # SFT trainer
```

## Configuration Files Structure

### Base Config (`config/train_smollm3.py`)

```python
@dataclass
class SmolLM3Config:
    # Trainer type selection
    trainer_type: str = "sft"  # "sft" or "dpo"
    
    # Model configuration
    model_name: str = "HuggingFaceTB/SmolLM3-3B"
    max_seq_length: int = 4096
    # ... other fields
```

### DPO Config (`config/train_smollm3_dpo.py`)

```python
@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
    # Trainer type selection
    trainer_type: str = "dpo"  # Override default to use DPO trainer
    
    # DPO-specific configuration
    beta: float = 0.1
    # ... DPO-specific fields
```

### Specialized Configs (e.g., `config/train_smollm3_openhermes_fr_a100_multiple_passes.py`)

```python
@dataclass
class SmolLM3ConfigOpenHermesFRMultiplePasses(SmolLM3Config):
    # Inherits trainer_type = "sft" from base config
    
    # Specialized configuration for multiple passes
    batch_size: int = 6
    gradient_accumulation_steps: int = 20
    learning_rate: float = 3e-6
    max_iters: int = 25000
    # ... other specialized fields
```

## Trainer Type Priority

The trainer type is determined in the following order of priority:

1. **Command line argument** (`--trainer_type`) - Highest priority
2. **Config file** (`trainer_type` field) - Medium priority  
3. **Default value** (`"sft"`) - Lowest priority

## Usage Examples

### Using SFT Trainer with Different Configs

```bash
# Basic SFT training (uses base config)
python src/train.py config/train_smollm3.py

# SFT training with specialized config
python src/train.py config/train_smollm3_openhermes_fr_a100_multiple_passes.py

# SFT training with override
python src/train.py config/train_smollm3.py --trainer_type sft

# DPO training (uses DPO config)
python src/train.py config/train_smollm3_dpo.py

# Override config's trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo
```

### Launch Script Usage

```bash
./launch.sh
# Select "SFT" when prompted for trainer type
# The system will use the appropriate config based on selection
```

## Configuration Inheritance

All specialized configs inherit from `SmolLM3Config` and automatically get:

- `trainer_type = "sft"` (default)
- All base training parameters
- All monitoring configuration
- All data configuration

Specialized configs can override any of these parameters for their specific use case.

## SFT Trainer Features

The SFT trainer provides:

1. **SFTTrainer Backend**: Uses Hugging Face's `SFTTrainer` for instruction tuning
2. **Fallback Support**: Falls back to standard `Trainer` if `SFTTrainer` fails
3. **Config Integration**: Uses all config parameters for training setup
4. **Monitoring**: Integrates with Trackio for experiment tracking
5. **Checkpointing**: Supports model checkpointing and resuming
6. **Mixed Precision**: Supports fp16 and bf16 training

## Troubleshooting

### Common Issues

1. **Missing trainer_type field**: Ensure all configs have the `trainer_type` field
2. **Config inheritance issues**: Check that specialized configs properly inherit from base
3. **Parameter conflicts**: Ensure command line arguments don't conflict with config values

### Debugging

Enable verbose logging to see config usage:

```bash
python src/train.py config/train_smollm3.py --trainer_type sft
```

Look for these log messages:
```
Using trainer type: sft
Initializing SFT trainer...
Creating SFTTrainer with training arguments...
```

## Related Documentation

- [Trainer Selection Guide](TRAINER_SELECTION_GUIDE.md)
- [Training Configuration Guide](TRAINING_CONFIGURATION_GUIDE.md)
- [Monitoring Integration Guide](MONITORING_INTEGRATION_GUIDE.md)