Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

SmolFactory / docs /INTERACTIVE_PIPELINE_IMPROVEMENTS.md

Tonic

merge tonic into main for major refactor

42f4411 verified 4 months ago

preview code

raw

history blame

8.64 kB

	# Interactive Pipeline Improvements

	This document explains the improvements made to the `launch.sh` script to make it interactive and configurable for different training scenarios.

	## 🎯 Key Improvements

	### 1. Interactive User Interface
	- Colored Output: Added color-coded status messages for better UX
	- Input Validation: Real-time validation of user inputs
	- Default Values: Smart defaults for common configurations
	- Error Handling: Graceful error handling with helpful messages

	### 2. Training Configuration Selection
	The script now offers 4 predefined training configurations:

	#### Basic Training (Default)
	```bash
	Model: SmolLM3-3B
	Dataset: SmolTalk
	Epochs: 3
	Batch Size: 2
	Learning Rate: 5e-6
	Sequence Length: 4096
	Best for: Quick experiments, learning
	```

	#### H100 Lightweight (Rapid)
	```bash
	Model: SmolLM3-3B
	Dataset: OpenHermes-FR (80K samples)
	Epochs: 1
	Batch Size: 16
	Learning Rate: 8e-6
	Sequence Length: 8192
	Best for: Rapid training on H100
	```

	#### A100 Large Scale
	```bash
	Model: SmolLM3-3B
	Dataset: OpenHermes-FR
	Epochs: 1.3 passes
	Batch Size: 8
	Learning Rate: 5e-6
	Sequence Length: 8192
	Best for: High-performance training
	```

	#### Multiple Passes
	```bash
	Model: SmolLM3-3B
	Dataset: OpenHermes-FR
	Epochs: 4 passes
	Batch Size: 6
	Learning Rate: 3e-6
	Sequence Length: 8192
	Best for: Thorough training
	```

	#### Custom Configuration
	- User-defined parameters
	- Flexible model and dataset selection
	- Custom training parameters

	### 3. Enhanced User Experience

	#### Step-by-Step Guidance
	1. Authentication - HF username and token validation
	2. Configuration Selection - Choose from predefined configs
	3. Experiment Setup - Configure experiment details
	4. Training Parameters - Adjust hyperparameters
	5. Deployment Setup - Trackio Space configuration
	6. Confirmation - Review and confirm settings

	#### Input Functions
	```bash
	# Get input with default value
	get_input "Prompt" "default_value" VARIABLE_NAME

	# Select from options
	select_option "Choose option:" "Option 1" "Option 2" "Option 3" VARIABLE_NAME

	# Validate HF token
	validate_hf_token "$HF_TOKEN"
	```

	#### Colored Output Functions
	```bash
	print_status "Success message" # Green ✅
	print_warning "Warning message" # Yellow ⚠️
	print_error "Error message" # Red ❌
	print_info "Info message" # Blue ℹ️
	print_header "Header message" # Purple 🚀
	print_step "Step message" # Cyan 📋
	```

	### 4. Dynamic Configuration Generation

	The script now generates training configurations based on user selection:

	```python
	# Generated config file
	config = SmolLM3Config(
	model_name="$MODEL_NAME",
	max_seq_length=$MAX_SEQ_LENGTH,
	batch_size=$BATCH_SIZE,
	learning_rate=$LEARNING_RATE,
	# ... other parameters
	)
	```

	### 5. Improved Error Handling

	#### Input Validation
	- Required field validation
	- HF token validation
	- Numeric input validation
	- Choice validation

	#### Graceful Degradation
	- Clear error messages
	- Recovery suggestions
	- Exit on critical errors

	### 6. Configuration Management

	#### User Credentials
	- Interactive username input
	- Secure token input
	- Real-time token validation

	#### Experiment Details
	- Dynamic experiment naming
	- Repository name generation
	- Dataset repository configuration

	#### Training Parameters
	- Batch size selection
	- Learning rate adjustment
	- Sequence length configuration
	- Save/eval/logging steps

	### 7. Enhanced Monitoring Integration

	#### Trackio Space
	- Dynamic space naming
	- Automatic deployment
	- URL generation

	#### HF Datasets
	- Dataset repository setup
	- Experiment data storage
	- Access configuration

	## 🔧 Technical Improvements

	### 1. Modular Functions
	```bash
	# Input handling
	get_input() # Get user input with defaults
	select_option() # Select from options
	validate_hf_token() # Validate HF token

	# Configuration
	show_training_configs() # Display available configs
	get_training_config() # Get config based on selection
	create_training_config() # Generate config file

	# Output formatting
	print_status() # Success messages
	print_warning() # Warning messages
	print_error() # Error messages
	print_info() # Info messages
	print_header() # Header messages
	print_step() # Step messages
	```

	### 2. Configuration Selection Logic
	```bash
	case "$config_type" in
	"Basic Training")
	MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
	DATASET_NAME="HuggingFaceTB/smoltalk"
	# ... other parameters
	;;
	"A100 Large Scale")
	MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
	DATASET_NAME="legmlai/openhermes-fr"
	# ... other parameters
	;;
	# ... other configurations
	esac
	```

	### 3. Dynamic File Generation
	```bash
	# Generate training config
	create_training_config "$CONFIG_FILE"

	# Generate deployment input
	cat > deploy_input.txt << EOF
	$HF_USERNAME
	$TRACKIO_SPACE_NAME
	$HF_TOKEN
	EOF
	```

	## 📊 User Workflow

	### Before (Static)
	1. Edit `launch.sh` manually
	2. Update hardcoded variables
	3. Run script
	4. Hope configuration is correct

	### After (Interactive)
	1. Run `./launch.sh`
	2. Follow interactive prompts
	3. Select training configuration
	4. Confirm settings
	5. Watch automated pipeline

	## 🎯 Benefits

	### For Users
	- No Manual Editing: No need to edit script files
	- Guided Experience: Step-by-step prompts
	- Validation: Real-time input validation
	- Flexibility: Multiple configuration options
	- Safety: Confirmation before execution

	### For Developers
	- Maintainable: Modular function structure
	- Extensible: Easy to add new configurations
	- Robust: Comprehensive error handling
	- User-Friendly: Clear feedback and guidance

	### For Different Use Cases
	- Beginners: Basic Training configuration
	- H100 Users: H100 Lightweight for rapid experiments
	- Researchers: A100 Large Scale for serious experiments
	- Production: Multiple Passes for thorough training
	- Custom: User-defined parameters for specific needs

	## 🔄 Configuration Examples

	### Quick Start (Basic Training)
	```bash
	./launch.sh
	# Follow prompts:
	# 1. Enter HF username and token
	# 2. Select "Basic Training"
	# 3. Confirm settings
	# 4. Watch automated pipeline
	```

	### High-Performance Training (A100)
	```bash
	./launch.sh
	# Follow prompts:
	# 1. Enter HF username and token
	# 2. Select "A100 Large Scale"
	# 3. Adjust parameters if needed
	# 4. Confirm and run
	```

	### Rapid Training (H100)
	```bash
	./launch.sh
	# Follow prompts:
	# 1. Enter HF username and token
	# 2. Select "H100 Lightweight (Rapid)"
	# 3. Confirm settings
	# 4. Watch rapid training on H100
	```

	### Custom Training
	```bash
	./launch.sh
	# Follow prompts:
	# 1. Enter HF username and token
	# 2. Select "Custom Configuration"
	# 3. Enter custom parameters:
	# - Model: microsoft/DialoGPT-medium
	# - Dataset: your-custom-dataset
	# - Epochs: 5
	# - Batch Size: 4
	# - Learning Rate: 1e-5
	# 4. Confirm and run
	```

	## 🚀 Future Enhancements

	### Planned Improvements
	- GUI Interface: Web-based configuration interface
	- Configuration Templates: Save/load custom configurations
	- Advanced Validation: More sophisticated input validation
	- Progress Tracking: Real-time progress indicators
	- Rollback Capability: Undo changes if needed

	### Extensibility
	- Plugin System: Add custom training configurations
	- API Integration: Connect to external services
	- Multi-GPU Support: Distributed training options
	- Advanced Monitoring: Enhanced tracking capabilities

	## 📋 Migration Guide

	### For Existing Users
	1. Backup: Save your current `launch.sh`
	2. Update: Replace with new interactive version
	3. Test: Run with basic configuration first
	4. Migrate: Use interactive prompts instead of manual editing

	### For New Users
	1. Setup: Run `python setup_launch.py`
	2. Check: Run `python check_requirements.py`
	3. Launch: Run `./launch.sh`
	4. Follow: Use interactive prompts

	## 🎉 Conclusion

	The interactive pipeline provides a much better user experience with:
	- Guided Configuration: No manual editing required
	- Multiple Options: Predefined configurations for different use cases
	- Validation: Real-time input validation and error handling
	- Flexibility: Custom configuration support
	- Safety: Confirmation steps and error recovery

	The script is now production-ready for users of all skill levels, from beginners to advanced researchers.