Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

SmolFactory / docs /TRAINER_SELECTION_GUIDE.md

Tonic

adds sft , quantization, better readmes

40fd629 verified 2 months ago

preview code

raw

history blame

5.55 kB

	# Trainer Selection Guide

	## Overview

	This guide explains how to use the new trainer selection feature that allows you to choose between SFT (Supervised Fine-tuning) and DPO (Direct Preference Optimization) trainers in the SmolLM3 fine-tuning pipeline.

	## Trainer Types

	### SFT (Supervised Fine-tuning)
	- Purpose: Standard instruction tuning for most fine-tuning tasks
	- Use Case: General instruction following, conversation, and task-specific training
	- Dataset Format: Standard prompt-completion pairs
	- Trainer: `SmolLM3Trainer` with `SFTTrainer` backend
	- Default: Yes (default trainer type)

	### DPO (Direct Preference Optimization)
	- Purpose: Preference-based training using human feedback
	- Use Case: Aligning models with human preferences, reducing harmful outputs
	- Dataset Format: Preference pairs (chosen/rejected responses)
	- Trainer: `SmolLM3DPOTrainer` with `DPOTrainer` backend
	- Default: No (must be explicitly selected)

	## Implementation Details

	### Configuration Changes

	#### Base Config (`config/train_smollm3.py`)
	```python
	@dataclass
	class SmolLM3Config:
	# Trainer type selection
	trainer_type: str = "sft" # "sft" or "dpo"
	# ... other fields
	```

	#### DPO Config (`config/train_smollm3_dpo.py`)
	```python
	@dataclass
	class SmolLM3DPOConfig(SmolLM3Config):
	# Trainer type selection
	trainer_type: str = "dpo" # Override default to use DPO trainer
	# ... DPO-specific fields
	```

	### Training Script Changes

	#### Command Line Arguments
	Both `src/train.py` and `scripts/training/train.py` now support:
	```bash
	--trainer_type {sft,dpo}
	```

	#### Trainer Selection Logic
	```python
	# Determine trainer type (command line overrides config)
	trainer_type = args.trainer_type or getattr(config, 'trainer_type', 'sft')

	# Initialize trainer based on type
	if trainer_type.lower() == 'dpo':
	trainer = SmolLM3DPOTrainer(...)
	else:
	trainer = SmolLM3Trainer(...)
	```

	### Launch Script Changes

	#### Interactive Selection
	The `launch.sh` script now prompts users to select the trainer type:
	```
	Step 3.5: Trainer Type Selection
	====================================

	Select the type of training to perform:
	1. SFT (Supervised Fine-tuning) - Standard instruction tuning
	- Uses SFTTrainer for instruction following
	- Suitable for most fine-tuning tasks
	- Optimized for instruction datasets

	2. DPO (Direct Preference Optimization) - Preference-based training
	- Uses DPOTrainer for preference learning
	- Requires preference datasets (chosen/rejected pairs)
	- Optimizes for human preferences
	```

	#### Configuration Generation
	The generated config file includes the trainer type:
	```python
	config = SmolLM3Config(
	# Trainer type selection
	trainer_type="$TRAINER_TYPE",
	# ... other fields
	)
	```

	## Usage Examples

	### Using the Launch Script
	```bash
	./launch.sh
	# Follow the interactive prompts
	# Select "SFT" or "DPO" when prompted
	```

	### Using Command Line Arguments
	```bash
	# SFT training (default)
	python src/train.py config/train_smollm3.py

	# DPO training
	python src/train.py config/train_smollm3_dpo.py

	# Override trainer type
	python src/train.py config/train_smollm3.py --trainer_type dpo
	```

	### Using the Training Script
	```bash
	# SFT training
	python scripts/training/train.py --config config/train_smollm3.py

	# DPO training
	python scripts/training/train.py --config config/train_smollm3_dpo.py

	# Override trainer type
	python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo
	```

	## Dataset Requirements

	### SFT Training
	- Format: Standard instruction datasets
	- Fields: `prompt` and `completion` (or similar)
	- Examples: OpenHermes, Alpaca, instruction datasets

	### DPO Training
	- Format: Preference datasets
	- Fields: `chosen` and `rejected` responses
	- Examples: Human preference datasets, RLHF datasets

	## Configuration Priority

	1. Command line argument (`--trainer_type`) - Highest priority
	2. Config file (`trainer_type` field) - Medium priority
	3. Default value (`"sft"`) - Lowest priority

	## Monitoring and Logging

	Both trainer types support:
	- Trackio experiment tracking
	- Training metrics logging
	- Model checkpointing
	- Progress monitoring

	## Testing

	Run the trainer selection tests:
	```bash
	python tests/test_trainer_selection.py
	```

	This verifies:
	- Config inheritance works correctly
	- Trainer classes exist and are importable
	- Trainer type defaults are set correctly

	## Troubleshooting

	### Common Issues

	1. Import Errors: Ensure all dependencies are installed
	```bash
	pip install trl>=0.7.0 transformers>=4.30.0
	```

	2. Dataset Format: DPO requires preference datasets with `chosen`/`rejected` fields

	3. Memory Issues: DPO training may require more memory due to reference model

	4. Config Conflicts: Command line arguments override config file settings

	### Debugging

	Enable verbose logging to see trainer selection:
	```bash
	python src/train.py config/train_smollm3.py --trainer_type dpo
	```

	Look for these log messages:
	```
	Using trainer type: dpo
	Initializing DPO trainer...
	```

	## Future Enhancements

	- Support for additional trainer types (RLHF, PPO, etc.)
	- Automatic dataset format detection
	- Enhanced preference dataset validation
	- Multi-objective training support

	## Related Documentation

	- [Training Configuration Guide](TRAINING_CONFIGURATION_GUIDE.md)
	- [Dataset Preparation Guide](DATASET_PREPARATION_GUIDE.md)
	- [Monitoring Integration Guide](MONITORING_INTEGRATION_GUIDE.md)