Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

SmolFactory / docs /TRAINER_SELECTION_GUIDE.md

Tonic

adds sft , quantization, better readmes

40fd629 verified about 2 months ago

preview code

raw

history blame

5.55 kB

Trainer Selection Guide

Overview

This guide explains how to use the new trainer selection feature that allows you to choose between SFT (Supervised Fine-tuning) and DPO (Direct Preference Optimization) trainers in the SmolLM3 fine-tuning pipeline.

Trainer Types

SFT (Supervised Fine-tuning)

Purpose: Standard instruction tuning for most fine-tuning tasks
Use Case: General instruction following, conversation, and task-specific training
Dataset Format: Standard prompt-completion pairs
Trainer: SmolLM3Trainer with SFTTrainer backend
Default: Yes (default trainer type)

DPO (Direct Preference Optimization)

Purpose: Preference-based training using human feedback
Use Case: Aligning models with human preferences, reducing harmful outputs
Dataset Format: Preference pairs (chosen/rejected responses)
Trainer: SmolLM3DPOTrainer with DPOTrainer backend
Default: No (must be explicitly selected)

Implementation Details

Configuration Changes

Base Config (`config/train_smollm3.py`)

@dataclass
class SmolLM3Config:
    # Trainer type selection
    trainer_type: str = "sft"  # "sft" or "dpo"
    # ... other fields

DPO Config (`config/train_smollm3_dpo.py`)

@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
    # Trainer type selection
    trainer_type: str = "dpo"  # Override default to use DPO trainer
    # ... DPO-specific fields

Training Script Changes

Command Line Arguments

Both src/train.py and scripts/training/train.py now support:

--trainer_type {sft,dpo}

Trainer Selection Logic

# Determine trainer type (command line overrides config)
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')

# Initialize trainer based on type
if trainer_type.lower() == 'dpo':
    trainer = SmolLM3DPOTrainer(...)
else:
    trainer = SmolLM3Trainer(...)

Launch Script Changes

Interactive Selection

The `launch.sh` script now prompts users to select the trainer type: ``` Step 3.5: Trainer Type Selection

Select the type of training to perform:

SFT (Supervised Fine-tuning) - Standard instruction tuning
- Uses SFTTrainer for instruction following
- Suitable for most fine-tuning tasks
- Optimized for instruction datasets
DPO (Direct Preference Optimization) - Preference-based training
- Uses DPOTrainer for preference learning
- Requires preference datasets (chosen/rejected pairs)
- Optimizes for human preferences


#### Configuration Generation
The generated config file includes the trainer type:
```python
config = SmolLM3Config(
    # Trainer type selection
    trainer_type="$TRAINER_TYPE",
    # ... other fields
)

Usage Examples

Using the Launch Script

./launch.sh
# Follow the interactive prompts
# Select "SFT" or "DPO" when prompted

Using Command Line Arguments

# SFT training (default)
python src/train.py config/train_smollm3.py

# DPO training
python src/train.py config/train_smollm3_dpo.py

# Override trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo

Using the Training Script

# SFT training
python scripts/training/train.py --config config/train_smollm3.py

# DPO training
python scripts/training/train.py --config config/train_smollm3_dpo.py

# Override trainer type
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo

Dataset Requirements

SFT Training

Format: Standard instruction datasets
Fields: prompt and completion (or similar)
Examples: OpenHermes, Alpaca, instruction datasets

DPO Training

Format: Preference datasets
Fields: chosen and rejected responses
Examples: Human preference datasets, RLHF datasets

Configuration Priority

Command line argument (--trainer_type) - Highest priority
Config file (trainer_type field) - Medium priority
Default value ("sft") - Lowest priority

Monitoring and Logging

Both trainer types support:

Trackio experiment tracking
Training metrics logging
Model checkpointing
Progress monitoring

Testing

Run the trainer selection tests:

python tests/test_trainer_selection.py

This verifies:

Config inheritance works correctly
Trainer classes exist and are importable
Trainer type defaults are set correctly

Troubleshooting

Common Issues

Import Errors: Ensure all dependencies are installed
```
pip install trl>=0.7.0 transformers>=4.30.0
```
Dataset Format: DPO requires preference datasets with chosen/rejected fields
Memory Issues: DPO training may require more memory due to reference model
Config Conflicts: Command line arguments override config file settings

Debugging

Enable verbose logging to see trainer selection:

python src/train.py config/train_smollm3.py --trainer_type dpo

Look for these log messages:

Using trainer type: dpo
Initializing DPO trainer...

Future Enhancements

Support for additional trainer types (RLHF, PPO, etc.)
Automatic dataset format detection
Enhanced preference dataset validation
Multi-objective training support