Spaces:
Running
Running
| # Interactive Pipeline Improvements | |
| This document explains the improvements made to the `launch.sh` script to make it interactive and configurable for different training scenarios. | |
| ## π― Key Improvements | |
| ### 1. **Interactive User Interface** | |
| - **Colored Output**: Added color-coded status messages for better UX | |
| - **Input Validation**: Real-time validation of user inputs | |
| - **Default Values**: Smart defaults for common configurations | |
| - **Error Handling**: Graceful error handling with helpful messages | |
| ### 2. **Training Configuration Selection** | |
| The script now offers 4 predefined training configurations: | |
| #### **Basic Training (Default)** | |
| ```bash | |
| Model: SmolLM3-3B | |
| Dataset: SmolTalk | |
| Epochs: 3 | |
| Batch Size: 2 | |
| Learning Rate: 5e-6 | |
| Sequence Length: 4096 | |
| Best for: Quick experiments, learning | |
| ``` | |
| #### **H100 Lightweight (Rapid)** | |
| ```bash | |
| Model: SmolLM3-3B | |
| Dataset: OpenHermes-FR (80K samples) | |
| Epochs: 1 | |
| Batch Size: 16 | |
| Learning Rate: 8e-6 | |
| Sequence Length: 8192 | |
| Best for: Rapid training on H100 | |
| ``` | |
| #### **A100 Large Scale** | |
| ```bash | |
| Model: SmolLM3-3B | |
| Dataset: OpenHermes-FR | |
| Epochs: 1.3 passes | |
| Batch Size: 8 | |
| Learning Rate: 5e-6 | |
| Sequence Length: 8192 | |
| Best for: High-performance training | |
| ``` | |
| #### **Multiple Passes** | |
| ```bash | |
| Model: SmolLM3-3B | |
| Dataset: OpenHermes-FR | |
| Epochs: 4 passes | |
| Batch Size: 6 | |
| Learning Rate: 3e-6 | |
| Sequence Length: 8192 | |
| Best for: Thorough training | |
| ``` | |
| #### **Custom Configuration** | |
| - User-defined parameters | |
| - Flexible model and dataset selection | |
| - Custom training parameters | |
| ### 3. **Enhanced User Experience** | |
| #### **Step-by-Step Guidance** | |
| 1. **Authentication** - HF username and token validation | |
| 2. **Configuration Selection** - Choose from predefined configs | |
| 3. **Experiment Setup** - Configure experiment details | |
| 4. **Training Parameters** - Adjust hyperparameters | |
| 5. **Deployment Setup** - Trackio Space configuration | |
| 6. **Confirmation** - Review and confirm settings | |
| #### **Input Functions** | |
| ```bash | |
| # Get input with default value | |
| get_input "Prompt" "default_value" VARIABLE_NAME | |
| # Select from options | |
| select_option "Choose option:" "Option 1" "Option 2" "Option 3" VARIABLE_NAME | |
| # Validate HF token | |
| validate_hf_token "$HF_TOKEN" | |
| ``` | |
| #### **Colored Output Functions** | |
| ```bash | |
| print_status "Success message" # Green β | |
| print_warning "Warning message" # Yellow β οΈ | |
| print_error "Error message" # Red β | |
| print_info "Info message" # Blue βΉοΈ | |
| print_header "Header message" # Purple π | |
| print_step "Step message" # Cyan π | |
| ``` | |
| ### 4. **Dynamic Configuration Generation** | |
| The script now generates training configurations based on user selection: | |
| ```python | |
| # Generated config file | |
| config = SmolLM3Config( | |
| model_name="$MODEL_NAME", | |
| max_seq_length=$MAX_SEQ_LENGTH, | |
| batch_size=$BATCH_SIZE, | |
| learning_rate=$LEARNING_RATE, | |
| # ... other parameters | |
| ) | |
| ``` | |
| ### 5. **Improved Error Handling** | |
| #### **Input Validation** | |
| - Required field validation | |
| - HF token validation | |
| - Numeric input validation | |
| - Choice validation | |
| #### **Graceful Degradation** | |
| - Clear error messages | |
| - Recovery suggestions | |
| - Exit on critical errors | |
| ### 6. **Configuration Management** | |
| #### **User Credentials** | |
| - Interactive username input | |
| - Secure token input | |
| - Real-time token validation | |
| #### **Experiment Details** | |
| - Dynamic experiment naming | |
| - Repository name generation | |
| - Dataset repository configuration | |
| #### **Training Parameters** | |
| - Batch size selection | |
| - Learning rate adjustment | |
| - Sequence length configuration | |
| - Save/eval/logging steps | |
| ### 7. **Enhanced Monitoring Integration** | |
| #### **Trackio Space** | |
| - Dynamic space naming | |
| - Automatic deployment | |
| - URL generation | |
| #### **HF Datasets** | |
| - Dataset repository setup | |
| - Experiment data storage | |
| - Access configuration | |
| ## π§ Technical Improvements | |
| ### 1. **Modular Functions** | |
| ```bash | |
| # Input handling | |
| get_input() # Get user input with defaults | |
| select_option() # Select from options | |
| validate_hf_token() # Validate HF token | |
| # Configuration | |
| show_training_configs() # Display available configs | |
| get_training_config() # Get config based on selection | |
| create_training_config() # Generate config file | |
| # Output formatting | |
| print_status() # Success messages | |
| print_warning() # Warning messages | |
| print_error() # Error messages | |
| print_info() # Info messages | |
| print_header() # Header messages | |
| print_step() # Step messages | |
| ``` | |
| ### 2. **Configuration Selection Logic** | |
| ```bash | |
| case "$config_type" in | |
| "Basic Training") | |
| MODEL_NAME="HuggingFaceTB/SmolLM3-3B" | |
| DATASET_NAME="HuggingFaceTB/smoltalk" | |
| # ... other parameters | |
| ;; | |
| "A100 Large Scale") | |
| MODEL_NAME="HuggingFaceTB/SmolLM3-3B" | |
| DATASET_NAME="legmlai/openhermes-fr" | |
| # ... other parameters | |
| ;; | |
| # ... other configurations | |
| esac | |
| ``` | |
| ### 3. **Dynamic File Generation** | |
| ```bash | |
| # Generate training config | |
| create_training_config "$CONFIG_FILE" | |
| # Generate deployment input | |
| cat > deploy_input.txt << EOF | |
| $HF_USERNAME | |
| $TRACKIO_SPACE_NAME | |
| $HF_TOKEN | |
| EOF | |
| ``` | |
| ## π User Workflow | |
| ### **Before (Static)** | |
| 1. Edit `launch.sh` manually | |
| 2. Update hardcoded variables | |
| 3. Run script | |
| 4. Hope configuration is correct | |
| ### **After (Interactive)** | |
| 1. Run `./launch.sh` | |
| 2. Follow interactive prompts | |
| 3. Select training configuration | |
| 4. Confirm settings | |
| 5. Watch automated pipeline | |
| ## π― Benefits | |
| ### **For Users** | |
| - **No Manual Editing**: No need to edit script files | |
| - **Guided Experience**: Step-by-step prompts | |
| - **Validation**: Real-time input validation | |
| - **Flexibility**: Multiple configuration options | |
| - **Safety**: Confirmation before execution | |
| ### **For Developers** | |
| - **Maintainable**: Modular function structure | |
| - **Extensible**: Easy to add new configurations | |
| - **Robust**: Comprehensive error handling | |
| - **User-Friendly**: Clear feedback and guidance | |
| ### **For Different Use Cases** | |
| - **Beginners**: Basic Training configuration | |
| - **H100 Users**: H100 Lightweight for rapid experiments | |
| - **Researchers**: A100 Large Scale for serious experiments | |
| - **Production**: Multiple Passes for thorough training | |
| - **Custom**: User-defined parameters for specific needs | |
| ## π Configuration Examples | |
| ### **Quick Start (Basic Training)** | |
| ```bash | |
| ./launch.sh | |
| # Follow prompts: | |
| # 1. Enter HF username and token | |
| # 2. Select "Basic Training" | |
| # 3. Confirm settings | |
| # 4. Watch automated pipeline | |
| ``` | |
| ### **High-Performance Training (A100)** | |
| ```bash | |
| ./launch.sh | |
| # Follow prompts: | |
| # 1. Enter HF username and token | |
| # 2. Select "A100 Large Scale" | |
| # 3. Adjust parameters if needed | |
| # 4. Confirm and run | |
| ``` | |
| ### **Rapid Training (H100)** | |
| ```bash | |
| ./launch.sh | |
| # Follow prompts: | |
| # 1. Enter HF username and token | |
| # 2. Select "H100 Lightweight (Rapid)" | |
| # 3. Confirm settings | |
| # 4. Watch rapid training on H100 | |
| ``` | |
| ### **Custom Training** | |
| ```bash | |
| ./launch.sh | |
| # Follow prompts: | |
| # 1. Enter HF username and token | |
| # 2. Select "Custom Configuration" | |
| # 3. Enter custom parameters: | |
| # - Model: microsoft/DialoGPT-medium | |
| # - Dataset: your-custom-dataset | |
| # - Epochs: 5 | |
| # - Batch Size: 4 | |
| # - Learning Rate: 1e-5 | |
| # 4. Confirm and run | |
| ``` | |
| ## π Future Enhancements | |
| ### **Planned Improvements** | |
| - **GUI Interface**: Web-based configuration interface | |
| - **Configuration Templates**: Save/load custom configurations | |
| - **Advanced Validation**: More sophisticated input validation | |
| - **Progress Tracking**: Real-time progress indicators | |
| - **Rollback Capability**: Undo changes if needed | |
| ### **Extensibility** | |
| - **Plugin System**: Add custom training configurations | |
| - **API Integration**: Connect to external services | |
| - **Multi-GPU Support**: Distributed training options | |
| - **Advanced Monitoring**: Enhanced tracking capabilities | |
| ## π Migration Guide | |
| ### **For Existing Users** | |
| 1. **Backup**: Save your current `launch.sh` | |
| 2. **Update**: Replace with new interactive version | |
| 3. **Test**: Run with basic configuration first | |
| 4. **Migrate**: Use interactive prompts instead of manual editing | |
| ### **For New Users** | |
| 1. **Setup**: Run `python setup_launch.py` | |
| 2. **Check**: Run `python check_requirements.py` | |
| 3. **Launch**: Run `./launch.sh` | |
| 4. **Follow**: Use interactive prompts | |
| ## π Conclusion | |
| The interactive pipeline provides a much better user experience with: | |
| - **Guided Configuration**: No manual editing required | |
| - **Multiple Options**: Predefined configurations for different use cases | |
| - **Validation**: Real-time input validation and error handling | |
| - **Flexibility**: Custom configuration support | |
| - **Safety**: Confirmation steps and error recovery | |
| The script is now production-ready for users of all skill levels, from beginners to advanced researchers. |