Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

SmolFactory / docs /TRAINER_SELECTION_SUMMARY.md

Tonic

adds sft , quantization, better readmes

40fd629 verified 4 months ago

preview code

raw

history blame

4.3 kB

Trainer Selection Implementation Summary

✅ Completed Implementation

1. Configuration Changes

✅ Added trainer_type field to base SmolLM3Config (default: "sft")
✅ Added trainer_type field to SmolLM3DPOConfig (default: "dpo")
✅ Updated config file generation in launch.sh to include trainer_type

2. Training Script Updates

✅ Added --trainer_type argument to src/train.py
✅ Added --trainer-type argument to scripts/training/train.py
✅ Implemented trainer selection logic in src/train.py
✅ Updated trainer instantiation to support both SFT and DPO

3. Launch Script Updates

✅ Added interactive trainer type selection (Step 3.5)
✅ Updated configuration summary to show trainer type
✅ Updated training parameters display to show trainer type
✅ Updated training script call to pass trainer_type argument
✅ Updated summary report to include trainer type

4. Documentation and Testing

✅ Created comprehensive TRAINER_SELECTION_GUIDE.md
✅ Created test script tests/test_trainer_selection.py
✅ All tests passing (3/3)

🎯 Key Features

Interactive Selection

Users can now choose between SFT and DPO during the launch process: ``` Step 3.5: Trainer Type Selection

Select the type of training to perform:

SFT (Supervised Fine-tuning) - Standard instruction tuning
DPO (Direct Preference Optimization) - Preference-based training


### Command Line Override
Users can override the config's trainer type via command line:
```bash
python src/train.py config/train_smollm3.py --trainer_type dpo
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo

Configuration Priority

Command line argument (highest priority)
Config file trainer_type field (medium priority)
Default value "sft" (lowest priority)

Automatic Trainer Selection

The system automatically selects the appropriate trainer:

SFT: Uses SmolLM3Trainer with SFTTrainer backend
DPO: Uses SmolLM3DPOTrainer with DPOTrainer backend

📋 Usage Examples

Launch Script (Interactive)

./launch.sh
# Follow prompts and select SFT or DPO

Direct Training

# SFT training (default)
python src/train.py config/train_smollm3.py

# DPO training
python src/train.py config/train_smollm3_dpo.py

# Override trainer type
python src/train.py config/train_smollm3.py --trainer_type dpo

Training Script

# SFT training
python scripts/training/train.py --config config/train_smollm3.py

# DPO training with override
python scripts/training/train.py --config config/train_smollm3.py --trainer-type dpo

🔧 Technical Details

Files Modified

config/train_smollm3.py - Added trainer_type field
config/train_smollm3_dpo.py - Added trainer_type field
src/train.py - Added trainer selection logic
scripts/training/train.py - Added trainer_type argument
launch.sh - Added interactive selection and config generation
src/trainer.py - Already had both trainer classes

Files Created

docs/TRAINER_SELECTION_GUIDE.md - Comprehensive documentation
tests/test_trainer_selection.py - Test suite
TRAINER_SELECTION_SUMMARY.md - This summary

✅ Testing Results

🧪 Testing Trainer Selection Implementation
==================================================
Testing config trainer_type...
✅ Base config trainer_type: sft
✅ DPO config trainer_type: dpo
Testing trainer class existence...
✅ Trainer module imported successfully
✅ Both trainer classes exist
Testing config inheritance...
✅ DPO config properly inherits from base config
✅ Trainer type inheritance works correctly
==================================================
Tests passed: 3/3
🎉 All tests passed!

🚀 Next Steps

The trainer selection feature is now fully implemented and tested. Users can:

Use the interactive launch script to select SFT or DPO
Override trainer type via command line arguments
Use DPO configs that automatically select DPO trainer
Monitor training with the same Trackio integration for both trainers

The implementation maintains backward compatibility while adding the new trainer selection capability.