SmolFactory / docs /INTERACTIVE_PIPELINE_IMPROVEMENTS.md
Tonic's picture
merge tonic into main for major refactor
42f4411 verified
|
raw
history blame
8.64 kB
# Interactive Pipeline Improvements
This document explains the improvements made to the `launch.sh` script to make it interactive and configurable for different training scenarios.
## 🎯 Key Improvements
### 1. **Interactive User Interface**
- **Colored Output**: Added color-coded status messages for better UX
- **Input Validation**: Real-time validation of user inputs
- **Default Values**: Smart defaults for common configurations
- **Error Handling**: Graceful error handling with helpful messages
### 2. **Training Configuration Selection**
The script now offers 4 predefined training configurations:
#### **Basic Training (Default)**
```bash
Model: SmolLM3-3B
Dataset: SmolTalk
Epochs: 3
Batch Size: 2
Learning Rate: 5e-6
Sequence Length: 4096
Best for: Quick experiments, learning
```
#### **H100 Lightweight (Rapid)**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR (80K samples)
Epochs: 1
Batch Size: 16
Learning Rate: 8e-6
Sequence Length: 8192
Best for: Rapid training on H100
```
#### **A100 Large Scale**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR
Epochs: 1.3 passes
Batch Size: 8
Learning Rate: 5e-6
Sequence Length: 8192
Best for: High-performance training
```
#### **Multiple Passes**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR
Epochs: 4 passes
Batch Size: 6
Learning Rate: 3e-6
Sequence Length: 8192
Best for: Thorough training
```
#### **Custom Configuration**
- User-defined parameters
- Flexible model and dataset selection
- Custom training parameters
### 3. **Enhanced User Experience**
#### **Step-by-Step Guidance**
1. **Authentication** - HF username and token validation
2. **Configuration Selection** - Choose from predefined configs
3. **Experiment Setup** - Configure experiment details
4. **Training Parameters** - Adjust hyperparameters
5. **Deployment Setup** - Trackio Space configuration
6. **Confirmation** - Review and confirm settings
#### **Input Functions**
```bash
# Get input with default value
get_input "Prompt" "default_value" VARIABLE_NAME
# Select from options
select_option "Choose option:" "Option 1" "Option 2" "Option 3" VARIABLE_NAME
# Validate HF token
validate_hf_token "$HF_TOKEN"
```
#### **Colored Output Functions**
```bash
print_status "Success message" # Green βœ…
print_warning "Warning message" # Yellow ⚠️
print_error "Error message" # Red ❌
print_info "Info message" # Blue ℹ️
print_header "Header message" # Purple πŸš€
print_step "Step message" # Cyan πŸ“‹
```
### 4. **Dynamic Configuration Generation**
The script now generates training configurations based on user selection:
```python
# Generated config file
config = SmolLM3Config(
model_name="$MODEL_NAME",
max_seq_length=$MAX_SEQ_LENGTH,
batch_size=$BATCH_SIZE,
learning_rate=$LEARNING_RATE,
# ... other parameters
)
```
### 5. **Improved Error Handling**
#### **Input Validation**
- Required field validation
- HF token validation
- Numeric input validation
- Choice validation
#### **Graceful Degradation**
- Clear error messages
- Recovery suggestions
- Exit on critical errors
### 6. **Configuration Management**
#### **User Credentials**
- Interactive username input
- Secure token input
- Real-time token validation
#### **Experiment Details**
- Dynamic experiment naming
- Repository name generation
- Dataset repository configuration
#### **Training Parameters**
- Batch size selection
- Learning rate adjustment
- Sequence length configuration
- Save/eval/logging steps
### 7. **Enhanced Monitoring Integration**
#### **Trackio Space**
- Dynamic space naming
- Automatic deployment
- URL generation
#### **HF Datasets**
- Dataset repository setup
- Experiment data storage
- Access configuration
## πŸ”§ Technical Improvements
### 1. **Modular Functions**
```bash
# Input handling
get_input() # Get user input with defaults
select_option() # Select from options
validate_hf_token() # Validate HF token
# Configuration
show_training_configs() # Display available configs
get_training_config() # Get config based on selection
create_training_config() # Generate config file
# Output formatting
print_status() # Success messages
print_warning() # Warning messages
print_error() # Error messages
print_info() # Info messages
print_header() # Header messages
print_step() # Step messages
```
### 2. **Configuration Selection Logic**
```bash
case "$config_type" in
"Basic Training")
MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
DATASET_NAME="HuggingFaceTB/smoltalk"
# ... other parameters
;;
"A100 Large Scale")
MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
DATASET_NAME="legmlai/openhermes-fr"
# ... other parameters
;;
# ... other configurations
esac
```
### 3. **Dynamic File Generation**
```bash
# Generate training config
create_training_config "$CONFIG_FILE"
# Generate deployment input
cat > deploy_input.txt << EOF
$HF_USERNAME
$TRACKIO_SPACE_NAME
$HF_TOKEN
EOF
```
## πŸ“Š User Workflow
### **Before (Static)**
1. Edit `launch.sh` manually
2. Update hardcoded variables
3. Run script
4. Hope configuration is correct
### **After (Interactive)**
1. Run `./launch.sh`
2. Follow interactive prompts
3. Select training configuration
4. Confirm settings
5. Watch automated pipeline
## 🎯 Benefits
### **For Users**
- **No Manual Editing**: No need to edit script files
- **Guided Experience**: Step-by-step prompts
- **Validation**: Real-time input validation
- **Flexibility**: Multiple configuration options
- **Safety**: Confirmation before execution
### **For Developers**
- **Maintainable**: Modular function structure
- **Extensible**: Easy to add new configurations
- **Robust**: Comprehensive error handling
- **User-Friendly**: Clear feedback and guidance
### **For Different Use Cases**
- **Beginners**: Basic Training configuration
- **H100 Users**: H100 Lightweight for rapid experiments
- **Researchers**: A100 Large Scale for serious experiments
- **Production**: Multiple Passes for thorough training
- **Custom**: User-defined parameters for specific needs
## πŸ”„ Configuration Examples
### **Quick Start (Basic Training)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "Basic Training"
# 3. Confirm settings
# 4. Watch automated pipeline
```
### **High-Performance Training (A100)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "A100 Large Scale"
# 3. Adjust parameters if needed
# 4. Confirm and run
```
### **Rapid Training (H100)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "H100 Lightweight (Rapid)"
# 3. Confirm settings
# 4. Watch rapid training on H100
```
### **Custom Training**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "Custom Configuration"
# 3. Enter custom parameters:
# - Model: microsoft/DialoGPT-medium
# - Dataset: your-custom-dataset
# - Epochs: 5
# - Batch Size: 4
# - Learning Rate: 1e-5
# 4. Confirm and run
```
## πŸš€ Future Enhancements
### **Planned Improvements**
- **GUI Interface**: Web-based configuration interface
- **Configuration Templates**: Save/load custom configurations
- **Advanced Validation**: More sophisticated input validation
- **Progress Tracking**: Real-time progress indicators
- **Rollback Capability**: Undo changes if needed
### **Extensibility**
- **Plugin System**: Add custom training configurations
- **API Integration**: Connect to external services
- **Multi-GPU Support**: Distributed training options
- **Advanced Monitoring**: Enhanced tracking capabilities
## πŸ“‹ Migration Guide
### **For Existing Users**
1. **Backup**: Save your current `launch.sh`
2. **Update**: Replace with new interactive version
3. **Test**: Run with basic configuration first
4. **Migrate**: Use interactive prompts instead of manual editing
### **For New Users**
1. **Setup**: Run `python setup_launch.py`
2. **Check**: Run `python check_requirements.py`
3. **Launch**: Run `./launch.sh`
4. **Follow**: Use interactive prompts
## πŸŽ‰ Conclusion
The interactive pipeline provides a much better user experience with:
- **Guided Configuration**: No manual editing required
- **Multiple Options**: Predefined configurations for different use cases
- **Validation**: Real-time input validation and error handling
- **Flexibility**: Custom configuration support
- **Safety**: Confirmation steps and error recovery
The script is now production-ready for users of all skill levels, from beginners to advanced researchers.