Spaces:
Running
Running
# Trainer Selection Guide | |
## Overview | |
This guide explains how to use the new trainer selection feature that allows you to choose between **SFT (Supervised Fine-tuning)** and **DPO (Direct Preference Optimization)** trainers in the SmolLM3 fine-tuning pipeline. | |
## Trainer Types | |
### SFT (Supervised Fine-tuning) | |
- **Purpose**: Standard instruction tuning for most fine-tuning tasks | |
- **Use Case**: General instruction following, conversation, and task-specific training | |
- **Dataset Format**: Standard prompt-completion pairs | |
- **Trainer**: `SmolLM3Trainer` with `SFTTrainer` backend | |
- **Default**: Yes (default trainer type) | |
### DPO (Direct Preference Optimization) | |
- **Purpose**: Preference-based training using human feedback | |
- **Use Case**: Aligning models with human preferences, reducing harmful outputs | |
- **Dataset Format**: Preference pairs (chosen/rejected responses) | |
- **Trainer**: `SmolLM3DPOTrainer` with `DPOTrainer` backend | |
- **Default**: No (must be explicitly selected) | |
## Implementation Details | |
### Configuration Changes | |
#### Base Config (`config/train_smollm3.py`) | |
```python | |
@dataclass | |
class SmolLM3Config: | |
# Trainer type selection | |
trainer_type: str = "sft" # "sft" or "dpo" | |
# ... other fields | |
``` | |
#### DPO Config (`config/train_smollm3_dpo.py`) | |
```python | |
@dataclass | |
class SmolLM3DPOConfig(SmolLM3Config): | |
# Trainer type selection | |
trainer_type: str = "dpo" # Override default to use DPO trainer | |
# ... DPO-specific fields | |
``` | |
### Training Script Changes | |
#### Command Line Arguments | |
Both `src/train.py` and `scripts/training/train.py` now support: | |
```bash | |
--trainer_type {sft,dpo} | |
``` | |
#### Trainer Selection Logic | |
```python | |
# Determine trainer type (command line overrides config) | |
trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft') | |
# Initialize trainer based on type | |
if trainer_type.lower() == 'dpo': | |
trainer = SmolLM3DPOTrainer(...) | |
else: | |
trainer = SmolLM3Trainer(...) | |
``` | |
### Launch Script Changes | |
#### Interactive Selection | |
The `launch.sh` script now prompts users to select the trainer type: | |
``` | |
Step 3.5: Trainer Type Selection | |
==================================== | |
Select the type of training to perform: | |
1. SFT (Supervised Fine-tuning) - Standard instruction tuning | |
- Uses SFTTrainer for instruction following | |
- Suitable for most fine-tuning tasks | |
- Optimized for instruction datasets | |
2. DPO (Direct Preference Optimization) - Preference-based training | |
- Uses DPOTrainer for preference learning | |
- Requires preference datasets (chosen/rejected pairs) | |
- Optimizes for human preferences | |
``` | |
#### Configuration Generation | |
The generated config file includes the trainer type: | |
```python | |
config = SmolLM3Config( | |
# Trainer type selection | |
trainer_type="$TRAINER_TYPE", | |
# ... other fields | |
) | |
``` | |
## Usage Examples | |
### Using the Launch Script | |
```bash | |
./launch.sh | |
# Follow the interactive prompts | |
# Select "SFT" or "DPO" when prompted | |
``` | |
### Using Command Line Arguments | |
```bash | |
# SFT training (default) | |
python src/train.py config/train_smollm3.py | |
# DPO training | |
python src/train.py config/train_smollm3_dpo.py | |
# Override trainer type | |
python src/train.py config/train_smollm3.py --trainer_type dpo | |
``` | |
### Using the Training Script | |
```bash | |
# SFT training | |
python scripts/training/train.py --config config/train_smollm3.py | |
# DPO training | |
python scripts/training/train.py --config config/train_smollm3_dpo.py | |
# Override trainer type | |
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo | |
``` | |
## Dataset Requirements | |
### SFT Training | |
- **Format**: Standard instruction datasets | |
- **Fields**: `prompt` and `completion` (or similar) | |
- **Examples**: OpenHermes, Alpaca, instruction datasets | |
### DPO Training | |
- **Format**: Preference datasets | |
- **Fields**: `chosen` and `rejected` responses | |
- **Examples**: Human preference datasets, RLHF datasets | |
## Configuration Priority | |
1. **Command line argument** (`--trainer_type`) - Highest priority | |
2. **Config file** (`trainer_type` field) - Medium priority | |
3. **Default value** (`"sft"`) - Lowest priority | |
## Monitoring and Logging | |
Both trainer types support: | |
- Trackio experiment tracking | |
- Training metrics logging | |
- Model checkpointing | |
- Progress monitoring | |
## Testing | |
Run the trainer selection tests: | |
```bash | |
python tests/test_trainer_selection.py | |
``` | |
This verifies: | |
- Config inheritance works correctly | |
- Trainer classes exist and are importable | |
- Trainer type defaults are set correctly | |
## Troubleshooting | |
### Common Issues | |
1. **Import Errors**: Ensure all dependencies are installed | |
```bash | |
pip install trl>=0.7.0 transformers>=4.30.0 | |
``` | |
2. **Dataset Format**: DPO requires preference datasets with `chosen`/`rejected` fields | |
3. **Memory Issues**: DPO training may require more memory due to reference model | |
4. **Config Conflicts**: Command line arguments override config file settings | |
### Debugging | |
Enable verbose logging to see trainer selection: | |
```bash | |
python src/train.py config/train_smollm3.py --trainer_type dpo | |
``` | |
Look for these log messages: | |
``` | |
Using trainer type: dpo | |
Initializing DPO trainer... | |
``` | |
## Future Enhancements | |
- Support for additional trainer types (RLHF, PPO, etc.) | |
- Automatic dataset format detection | |
- Enhanced preference dataset validation | |
- Multi-objective training support | |
## Related Documentation | |
- [Training Configuration Guide](TRAINING_CONFIGURATION_GUIDE.md) | |
- [Dataset Preparation Guide](DATASET_PREPARATION_GUIDE.md) | |
- [Monitoring Integration Guide](MONITORING_INTEGRATION_GUIDE.md) |