Spaces:
Running
Trainer Selection Guide
Overview
This guide explains how to use the new trainer selection feature that allows you to choose between SFT (Supervised Fine-tuning) and DPO (Direct Preference Optimization) trainers in the SmolLM3 fine-tuning pipeline.
Trainer Types
SFT (Supervised Fine-tuning)
- Purpose: Standard instruction tuning for most fine-tuning tasks
- Use Case: General instruction following, conversation, and task-specific training
- Dataset Format: Standard prompt-completion pairs
- Trainer:
SmolLM3Trainer
withSFTTrainer
backend - Default: Yes (default trainer type)
DPO (Direct Preference Optimization)
- Purpose: Preference-based training using human feedback
- Use Case: Aligning models with human preferences, reducing harmful outputs
- Dataset Format: Preference pairs (chosen/rejected responses)
- Trainer:
SmolLM3DPOTrainer
withDPOTrainer
backend - Default: No (must be explicitly selected)
Implementation Details
Configuration Changes
Base Config (config/train_smollm3.py
)
@dataclass
class SmolLM3Config:
# Trainer type selection
trainer_type: str = "sft" # "sft" or "dpo"
# ... other fields
DPO Config (config/train_smollm3_dpo.py
)
@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
# Trainer type selection
trainer_type: str = "dpo" # Override default to use DPO trainer
# ... DPO-specific fields
Training Script Changes
Command Line Arguments
Both src/train.py
and scripts/training/train.py
now support:
--trainer_type {sft,dpo}
Trainer Selection Logic
# Determine trainer type (command line overrides config)
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')
# Initialize trainer based on type
if trainer_type.lower() == 'dpo':
trainer = SmolLM3DPOTrainer(...)
else:
trainer = SmolLM3Trainer(...)
Launch Script Changes
Interactive Selection
The launch.sh
script now prompts users to select the trainer type:
```
Step 3.5: Trainer Type Selection
Select the type of training to perform:
SFT (Supervised Fine-tuning) - Standard instruction tuning
- Uses SFTTrainer for instruction following
- Suitable for most fine-tuning tasks
- Optimized for instruction datasets
DPO (Direct Preference Optimization) - Preference-based training
- Uses DPOTrainer for preference learning
- Requires preference datasets (chosen/rejected pairs)
- Optimizes for human preferences
#### Configuration Generation
The generated config file includes the trainer type:
```python
config = SmolLM3Config(
# Trainer type selection
trainer_type="$TRAINER_TYPE",
# ... other fields
)
Usage Examples
Using the Launch Script
./launch.sh
# Follow the interactive prompts
# Select "SFT" or "DPO" when prompted
Using Command Line Arguments
# SFT training (default)
python src/train.py config/train_smollm3.py
# DPO training
python src/train.py config/train_smollm3_dpo.py
# Override trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo
Using the Training Script
# SFT training
python scripts/training/train.py --config config/train_smollm3.py
# DPO training
python scripts/training/train.py --config config/train_smollm3_dpo.py
# Override trainer type
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo
Dataset Requirements
SFT Training
- Format: Standard instruction datasets
- Fields:
prompt
andcompletion
(or similar) - Examples: OpenHermes, Alpaca, instruction datasets
DPO Training
- Format: Preference datasets
- Fields:
chosen
andrejected
responses - Examples: Human preference datasets, RLHF datasets
Configuration Priority
- Command line argument (
--trainer_type
) - Highest priority - Config file (
trainer_type
field) - Medium priority - Default value (
"sft"
) - Lowest priority
Monitoring and Logging
Both trainer types support:
- Trackio experiment tracking
- Training metrics logging
- Model checkpointing
- Progress monitoring
Testing
Run the trainer selection tests:
python tests/test_trainer_selection.py
This verifies:
- Config inheritance works correctly
- Trainer classes exist and are importable
- Trainer type defaults are set correctly
Troubleshooting
Common Issues
Import Errors: Ensure all dependencies are installed
pip install trl>=0.7.0 transformers>=4.30.0
Dataset Format: DPO requires preference datasets with
chosen
/rejected
fieldsMemory Issues: DPO training may require more memory due to reference model
Config Conflicts: Command line arguments override config file settings
Debugging
Enable verbose logging to see trainer selection:
python src/train.py config/train_smollm3.py --trainer_type dpo
Look for these log messages:
Using trainer type: dpo
Initializing DPO trainer...
Future Enhancements
- Support for additional trainer types (RLHF, PPO, etc.)
- Automatic dataset format detection
- Enhanced preference dataset validation
- Multi-objective training support