Spaces:

Tonic
/

SmolFactory

Running

File size: 8,635 Bytes

ebe598e

# Interactive Pipeline Improvements

This document explains the improvements made to the `launch.sh` script to make it interactive and configurable for different training scenarios.

## 🎯 Key Improvements

### 1. **Interactive User Interface**
- **Colored Output**: Added color-coded status messages for better UX
- **Input Validation**: Real-time validation of user inputs
- **Default Values**: Smart defaults for common configurations
- **Error Handling**: Graceful error handling with helpful messages

### 2. **Training Configuration Selection**
The script now offers 4 predefined training configurations:

#### **Basic Training (Default)**
```bash
Model: SmolLM3-3B
Dataset: SmolTalk
Epochs: 3
Batch Size: 2
Learning Rate: 5e-6
Sequence Length: 4096
Best for: Quick experiments, learning
```

#### **H100 Lightweight (Rapid)**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR (80K samples)
Epochs: 1
Batch Size: 16
Learning Rate: 8e-6
Sequence Length: 8192
Best for: Rapid training on H100
```

#### **A100 Large Scale**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR
Epochs: 1.3 passes
Batch Size: 8
Learning Rate: 5e-6
Sequence Length: 8192
Best for: High-performance training
```

#### **Multiple Passes**
```bash
Model: SmolLM3-3B
Dataset: OpenHermes-FR
Epochs: 4 passes
Batch Size: 6
Learning Rate: 3e-6
Sequence Length: 8192
Best for: Thorough training
```

#### **Custom Configuration**
- User-defined parameters
- Flexible model and dataset selection
- Custom training parameters

### 3. **Enhanced User Experience**

#### **Step-by-Step Guidance**
1. **Authentication** - HF username and token validation
2. **Configuration Selection** - Choose from predefined configs
3. **Experiment Setup** - Configure experiment details
4. **Training Parameters** - Adjust hyperparameters
5. **Deployment Setup** - Trackio Space configuration
6. **Confirmation** - Review and confirm settings

#### **Input Functions**
```bash
# Get input with default value
get_input "Prompt" "default_value" VARIABLE_NAME

# Select from options
select_option "Choose option:" "Option 1" "Option 2" "Option 3" VARIABLE_NAME

# Validate HF token
validate_hf_token "$HF_TOKEN"
```

#### **Colored Output Functions**
```bash
print_status "Success message"    # Green ✅
print_warning "Warning message"   # Yellow ⚠️
print_error "Error message"       # Red ❌
print_info "Info message"         # Blue ℹ️
print_header "Header message"     # Purple 🚀
print_step "Step message"         # Cyan 📋
```

### 4. **Dynamic Configuration Generation**

The script now generates training configurations based on user selection:

```python
# Generated config file
config = SmolLM3Config(
    model_name="$MODEL_NAME",
    max_seq_length=$MAX_SEQ_LENGTH,
    batch_size=$BATCH_SIZE,
    learning_rate=$LEARNING_RATE,
    # ... other parameters
)
```

### 5. **Improved Error Handling**

#### **Input Validation**
- Required field validation
- HF token validation
- Numeric input validation
- Choice validation

#### **Graceful Degradation**
- Clear error messages
- Recovery suggestions
- Exit on critical errors

### 6. **Configuration Management**

#### **User Credentials**
- Interactive username input
- Secure token input
- Real-time token validation

#### **Experiment Details**
- Dynamic experiment naming
- Repository name generation
- Dataset repository configuration

#### **Training Parameters**
- Batch size selection
- Learning rate adjustment
- Sequence length configuration
- Save/eval/logging steps

### 7. **Enhanced Monitoring Integration**

#### **Trackio Space**
- Dynamic space naming
- Automatic deployment
- URL generation

#### **HF Datasets**
- Dataset repository setup
- Experiment data storage
- Access configuration

## 🔧 Technical Improvements

### 1. **Modular Functions**
```bash
# Input handling
get_input()          # Get user input with defaults
select_option()      # Select from options
validate_hf_token()  # Validate HF token

# Configuration
show_training_configs()    # Display available configs
get_training_config()      # Get config based on selection
create_training_config()   # Generate config file

# Output formatting
print_status()       # Success messages
print_warning()      # Warning messages
print_error()        # Error messages
print_info()         # Info messages
print_header()       # Header messages
print_step()         # Step messages
```

### 2. **Configuration Selection Logic**
```bash
case "$config_type" in
    "Basic Training")
        MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
        DATASET_NAME="HuggingFaceTB/smoltalk"
        # ... other parameters
        ;;
    "A100 Large Scale")
        MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
        DATASET_NAME="legmlai/openhermes-fr"
        # ... other parameters
        ;;
    # ... other configurations
esac
```

### 3. **Dynamic File Generation**
```bash
# Generate training config
create_training_config "$CONFIG_FILE"

# Generate deployment input
cat > deploy_input.txt << EOF
$HF_USERNAME
$TRACKIO_SPACE_NAME
$HF_TOKEN
EOF
```

## 📊 User Workflow

### **Before (Static)**
1. Edit `launch.sh` manually
2. Update hardcoded variables
3. Run script
4. Hope configuration is correct

### **After (Interactive)**
1. Run `./launch.sh`
2. Follow interactive prompts
3. Select training configuration
4. Confirm settings
5. Watch automated pipeline

## 🎯 Benefits

### **For Users**
- **No Manual Editing**: No need to edit script files
- **Guided Experience**: Step-by-step prompts
- **Validation**: Real-time input validation
- **Flexibility**: Multiple configuration options
- **Safety**: Confirmation before execution

### **For Developers**
- **Maintainable**: Modular function structure
- **Extensible**: Easy to add new configurations
- **Robust**: Comprehensive error handling
- **User-Friendly**: Clear feedback and guidance

### **For Different Use Cases**
- **Beginners**: Basic Training configuration
- **H100 Users**: H100 Lightweight for rapid experiments
- **Researchers**: A100 Large Scale for serious experiments
- **Production**: Multiple Passes for thorough training
- **Custom**: User-defined parameters for specific needs

## 🔄 Configuration Examples

### **Quick Start (Basic Training)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "Basic Training"
# 3. Confirm settings
# 4. Watch automated pipeline
```

### **High-Performance Training (A100)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "A100 Large Scale"
# 3. Adjust parameters if needed
# 4. Confirm and run
```

### **Rapid Training (H100)**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "H100 Lightweight (Rapid)"
# 3. Confirm settings
# 4. Watch rapid training on H100
```

### **Custom Training**
```bash
./launch.sh
# Follow prompts:
# 1. Enter HF username and token
# 2. Select "Custom Configuration"
# 3. Enter custom parameters:
#    - Model: microsoft/DialoGPT-medium
#    - Dataset: your-custom-dataset
#    - Epochs: 5
#    - Batch Size: 4
#    - Learning Rate: 1e-5
# 4. Confirm and run
```

## 🚀 Future Enhancements

### **Planned Improvements**
- **GUI Interface**: Web-based configuration interface
- **Configuration Templates**: Save/load custom configurations
- **Advanced Validation**: More sophisticated input validation
- **Progress Tracking**: Real-time progress indicators
- **Rollback Capability**: Undo changes if needed

### **Extensibility**
- **Plugin System**: Add custom training configurations
- **API Integration**: Connect to external services
- **Multi-GPU Support**: Distributed training options
- **Advanced Monitoring**: Enhanced tracking capabilities

## 📋 Migration Guide

### **For Existing Users**
1. **Backup**: Save your current `launch.sh`
2. **Update**: Replace with new interactive version
3. **Test**: Run with basic configuration first
4. **Migrate**: Use interactive prompts instead of manual editing

### **For New Users**
1. **Setup**: Run `python setup_launch.py`
2. **Check**: Run `python check_requirements.py`
3. **Launch**: Run `./launch.sh`
4. **Follow**: Use interactive prompts

## 🎉 Conclusion

The interactive pipeline provides a much better user experience with:
- **Guided Configuration**: No manual editing required
- **Multiple Options**: Predefined configurations for different use cases
- **Validation**: Real-time input validation and error handling
- **Flexibility**: Custom configuration support
- **Safety**: Confirmation steps and error recovery

The script is now production-ready for users of all skill levels, from beginners to advanced researchers.