Spaces:
Running
Running
Enhanced Trackio Interface Guide
Overview
Your Trackio application has been significantly enhanced to provide comprehensive monitoring and visualization for SmolLM3 training experiments. Here's how to make the most of it.
π Key Enhancements
1. Real-time Visualization
- Interactive Plots: Loss curves, accuracy, learning rate, GPU metrics
- Experiment Comparison: Compare multiple training runs side-by-side
- Live Updates: Watch training progress in real-time
2. Comprehensive Data Display
- Formatted Output: Clean, emoji-rich experiment details
- Statistics Overview: Metrics count, parameters count, artifacts count
- Status Tracking: Visual status indicators (π’ running, β completed, β failed)
3. Demo Data Generation
- Realistic Simulation: Generate realistic training metrics for testing
- Multiple Metrics: Loss, accuracy, learning rate, GPU memory, training time
- Configurable Parameters: Customize demo data to match your setup
π How to Use with Your SmolLM3 Training
Step 1: Start Your Training
python run_a100_large_experiment.py \
--config config/train_smollm3_openhermes_fr_a100_balanced.py \
--trackio_url "https://tonic-test-trackio-test.hf.space" \
--experiment-name "petit-elle-l-aime-3-balanced" \
--output-dir ./outputs/balanced
Step 2: Monitor in Real-time
- Visit your Trackio Space:
https://tonic-test-trackio-test.hf.space
- Go to "View Experiments" tab
- Enter your experiment ID (e.g.,
exp_20231201_143022
) - Click "View Experiment" to see detailed information
Step 3: Visualize Training Progress
- Go to "π Visualizations" tab
- Enter your experiment ID
- Select a metric (loss, accuracy, learning_rate, gpu_memory, training_time)
- Click "Create Plot" to see interactive charts
Step 4: Compare Experiments
- In the "π Visualizations" tab
- Enter multiple experiment IDs (comma-separated)
- Click "Compare Experiments" to see side-by-side comparison
π― Interface Features
Create Experiment Tab
- Experiment Name: Descriptive name for your training run
- Description: Detailed description of what you're training
- Automatic ID Generation: Unique experiment identifier
Log Metrics Tab
- Experiment ID: The experiment to log metrics for
- Metrics JSON: Training metrics in JSON format
- Step: Current training step (optional)
Example metrics JSON:
{
"loss": 0.5234,
"accuracy": 0.8567,
"learning_rate": 3.5e-6,
"gpu_memory_gb": 22.5,
"gpu_utilization_percent": 87.3,
"training_time_per_step": 0.456
}
Log Parameters Tab
- Experiment ID: The experiment to log parameters for
- Parameters JSON: Training configuration in JSON format
Example parameters JSON:
{
"model_name": "HuggingFaceTB/SmolLM3-3B",
"batch_size": 8,
"learning_rate": 3.5e-6,
"max_iters": 18000,
"mixed_precision": "bf16",
"no_think_system_message": true
}
View Experiments Tab
- Experiment ID: Enter to view specific experiment
- List All Experiments: Shows overview of all experiments
- Detailed Information: Formatted display with statistics
π Visualizations Tab
- Training Metrics: Interactive plots for individual metrics
- Experiment Comparison: Side-by-side comparison of multiple runs
- Real-time Updates: Plots update as new data is logged
π― Demo Data Tab
- Generate Demo Data: Create realistic training data for testing
- Configurable: Adjust parameters to match your setup
- Multiple Metrics: Simulates loss, accuracy, GPU metrics, etc.
Update Status Tab
- Experiment ID: The experiment to update
- Status: running, completed, failed, paused
- Visual Indicators: Status shown with emojis
π What Gets Displayed
Training Metrics
- Loss: Training loss over time
- Accuracy: Model accuracy progression
- Learning Rate: Learning rate scheduling
- GPU Memory: Memory usage in GB
- GPU Utilization: GPU usage percentage
- Training Time: Time per training step
Experiment Details
- Basic Info: ID, name, description, status, creation time
- Statistics: Metrics count, parameters count, artifacts count
- Parameters: All training configuration
- Latest Metrics: Most recent training metrics
Visualizations
- Line Charts: Smooth curves showing metric progression
- Interactive Hover: Detailed information on hover
- Multiple Metrics: Switch between different metrics
- Comparison Charts: Side-by-side experiment comparison
π§ Integration with Your Training
Automatic Integration
Your training script automatically:
- Creates experiments with your specified name
- Logs parameters from your configuration
- Logs metrics every 25 steps (configurable)
- Logs system metrics (GPU memory, utilization)
- Logs checkpoints every 2000 steps
- Updates status when training completes
Manual Integration
You can also manually:
- Create experiments through the interface
- Log custom metrics for specific analysis
- Compare different runs with different parameters
- Generate demo data for testing the interface
π¨ Customization
Adding Custom Metrics
# In your training script
custom_metrics = {
"loss": current_loss,
"accuracy": current_accuracy,
"custom_metric": your_custom_value,
"gpu_memory": gpu_memory_usage
}
monitor.log_metrics(custom_metrics, step=current_step)
Custom Visualizations
The interface supports any metric you log. Just add it to your metrics JSON and it will appear in the visualization dropdown.
π¨ Troubleshooting
No Data Displayed
- Check experiment ID: Make sure you're using the correct ID
- Verify metrics were logged: Check if training is actually logging metrics
- Use demo data: Generate demo data to test the interface
Plots Not Updating
- Refresh the page: Sometimes plots need a refresh
- Check data format: Ensure metrics are in the correct JSON format
- Verify step numbers: Make sure step numbers are increasing
Interface Not Loading
- Check dependencies: Ensure plotly and pandas are installed
- Check Gradio version: Use Gradio 4.0.0 or higher
- Check browser console: Look for JavaScript errors
π Example Workflow
Start Training:
python run_a100_large_experiment.py --experiment-name "my_experiment"
Monitor Progress:
- Visit your Trackio Space
- Go to "View Experiments"
- Enter your experiment ID
- Watch real-time updates
Visualize Results:
- Go to "π Visualizations"
- Select "loss" metric
- Create plot to see training progress
Compare Runs:
- Run multiple experiments with different parameters
- Use "Compare Experiments" to see differences
Generate Demo Data:
- Use "π― Demo Data" tab to test the interface
- Generate realistic training data for demonstration
π Success Indicators
Your interface is working correctly when you see:
- β Formatted experiment details with emojis and structure
- β Interactive plots that respond to your inputs
- β Real-time metric updates during training
- β Clean experiment overview with statistics
- β Smooth visualization with hover information
The enhanced interface will now display much more meaningful information and provide a comprehensive monitoring experience for your SmolLM3 training experiments!