SmolFactory / docs /TRAINER_SELECTION_GUIDE.md
Tonic's picture
adds sft , quantization, better readmes
40fd629 verified
|
raw
history blame
5.55 kB
# Trainer Selection Guide
## Overview
This guide explains how to use the new trainer selection feature that allows you to choose between **SFT (Supervised Fine-tuning)** and **DPO (Direct Preference Optimization)** trainers in the SmolLM3 fine-tuning pipeline.
## Trainer Types
### SFT (Supervised Fine-tuning)
- **Purpose**: Standard instruction tuning for most fine-tuning tasks
- **Use Case**: General instruction following, conversation, and task-specific training
- **Dataset Format**: Standard prompt-completion pairs
- **Trainer**: `SmolLM3Trainer` with `SFTTrainer` backend
- **Default**: Yes (default trainer type)
### DPO (Direct Preference Optimization)
- **Purpose**: Preference-based training using human feedback
- **Use Case**: Aligning models with human preferences, reducing harmful outputs
- **Dataset Format**: Preference pairs (chosen/rejected responses)
- **Trainer**: `SmolLM3DPOTrainer` with `DPOTrainer` backend
- **Default**: No (must be explicitly selected)
## Implementation Details
### Configuration Changes
#### Base Config (`config/train_smollm3.py`)
```python
@dataclass
class SmolLM3Config:
# Trainer type selection
trainer_type: str = "sft" # "sft" or "dpo"
# ... other fields
```
#### DPO Config (`config/train_smollm3_dpo.py`)
```python
@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
# Trainer type selection
trainer_type: str = "dpo" # Override default to use DPO trainer
# ... DPO-specific fields
```
### Training Script Changes
#### Command Line Arguments
Both `src/train.py` and `scripts/training/train.py` now support:
```bash
--trainer_type {sft,dpo}
```
#### Trainer Selection Logic
```python
# Determine trainer type (command line overrides config)
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')
# Initialize trainer based on type
if trainer_type.lower() == 'dpo':
trainer = SmolLM3DPOTrainer(...)
else:
trainer = SmolLM3Trainer(...)
```
### Launch Script Changes
#### Interactive Selection
The `launch.sh` script now prompts users to select the trainer type:
```
Step 3.5: Trainer Type Selection
====================================
Select the type of training to perform:
1. SFT (Supervised Fine-tuning) - Standard instruction tuning
- Uses SFTTrainer for instruction following
- Suitable for most fine-tuning tasks
- Optimized for instruction datasets
2. DPO (Direct Preference Optimization) - Preference-based training
- Uses DPOTrainer for preference learning
- Requires preference datasets (chosen/rejected pairs)
- Optimizes for human preferences
```
#### Configuration Generation
The generated config file includes the trainer type:
```python
config = SmolLM3Config(
# Trainer type selection
trainer_type="$TRAINER_TYPE",
# ... other fields
)
```
## Usage Examples
### Using the Launch Script
```bash
./launch.sh
# Follow the interactive prompts
# Select "SFT" or "DPO" when prompted
```
### Using Command Line Arguments
```bash
# SFT training (default)
python src/train.py config/train_smollm3.py
# DPO training
python src/train.py config/train_smollm3_dpo.py
# Override trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo
```
### Using the Training Script
```bash
# SFT training
python scripts/training/train.py --config config/train_smollm3.py
# DPO training
python scripts/training/train.py --config config/train_smollm3_dpo.py
# Override trainer type
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo
```
## Dataset Requirements
### SFT Training
- **Format**: Standard instruction datasets
- **Fields**: `prompt` and `completion` (or similar)
- **Examples**: OpenHermes, Alpaca, instruction datasets
### DPO Training
- **Format**: Preference datasets
- **Fields**: `chosen` and `rejected` responses
- **Examples**: Human preference datasets, RLHF datasets
## Configuration Priority
1. **Command line argument** (`--trainer_type`) - Highest priority
2. **Config file** (`trainer_type` field) - Medium priority
3. **Default value** (`"sft"`) - Lowest priority
## Monitoring and Logging
Both trainer types support:
- Trackio experiment tracking
- Training metrics logging
- Model checkpointing
- Progress monitoring
## Testing
Run the trainer selection tests:
```bash
python tests/test_trainer_selection.py
```
This verifies:
- Config inheritance works correctly
- Trainer classes exist and are importable
- Trainer type defaults are set correctly
## Troubleshooting
### Common Issues
1. **Import Errors**: Ensure all dependencies are installed
```bash
pip install trl>=0.7.0 transformers>=4.30.0
```
2. **Dataset Format**: DPO requires preference datasets with `chosen`/`rejected` fields
3. **Memory Issues**: DPO training may require more memory due to reference model
4. **Config Conflicts**: Command line arguments override config file settings
### Debugging
Enable verbose logging to see trainer selection:
```bash
python src/train.py config/train_smollm3.py --trainer_type dpo
```
Look for these log messages:
```
Using trainer type: dpo
Initializing DPO trainer...
```
## Future Enhancements
- Support for additional trainer types (RLHF, PPO, etc.)
- Automatic dataset format detection
- Enhanced preference dataset validation
- Multi-objective training support
## Related Documentation
- [Training Configuration Guide](TRAINING_CONFIGURATION_GUIDE.md)
- [Dataset Preparation Guide](DATASET_PREPARATION_GUIDE.md)
- [Monitoring Integration Guide](MONITORING_INTEGRATION_GUIDE.md)