Spaces:
Running
Running
# Enhanced Trackio Interface Guide | |
## Overview | |
Your Trackio application has been significantly enhanced to provide comprehensive monitoring and visualization for SmolLM3 training experiments. Here's how to make the most of it. | |
## π Key Enhancements | |
### 1. **Real-time Visualization** | |
- **Interactive Plots**: Loss curves, accuracy, learning rate, GPU metrics | |
- **Experiment Comparison**: Compare multiple training runs side-by-side | |
- **Live Updates**: Watch training progress in real-time | |
### 2. **Comprehensive Data Display** | |
- **Formatted Output**: Clean, emoji-rich experiment details | |
- **Statistics Overview**: Metrics count, parameters count, artifacts count | |
- **Status Tracking**: Visual status indicators (π’ running, β completed, β failed) | |
### 3. **Demo Data Generation** | |
- **Realistic Simulation**: Generate realistic training metrics for testing | |
- **Multiple Metrics**: Loss, accuracy, learning rate, GPU memory, training time | |
- **Configurable Parameters**: Customize demo data to match your setup | |
## π How to Use with Your SmolLM3 Training | |
### Step 1: Start Your Training | |
```bash | |
python run_a100_large_experiment.py \ | |
--config config/train_smollm3_openhermes_fr_a100_balanced.py \ | |
--trackio_url "https://tonic-test-trackio-test.hf.space" \ | |
--experiment-name "petit-elle-l-aime-3-balanced" \ | |
--output-dir ./outputs/balanced | |
``` | |
### Step 2: Monitor in Real-time | |
1. **Visit your Trackio Space**: `https://tonic-test-trackio-test.hf.space` | |
2. **Go to "View Experiments" tab** | |
3. **Enter your experiment ID** (e.g., `exp_20231201_143022`) | |
4. **Click "View Experiment"** to see detailed information | |
### Step 3: Visualize Training Progress | |
1. **Go to "π Visualizations" tab** | |
2. **Enter your experiment ID** | |
3. **Select a metric** (loss, accuracy, learning_rate, gpu_memory, training_time) | |
4. **Click "Create Plot"** to see interactive charts | |
### Step 4: Compare Experiments | |
1. **In the "π Visualizations" tab** | |
2. **Enter multiple experiment IDs** (comma-separated) | |
3. **Click "Compare Experiments"** to see side-by-side comparison | |
## π― Interface Features | |
### Create Experiment Tab | |
- **Experiment Name**: Descriptive name for your training run | |
- **Description**: Detailed description of what you're training | |
- **Automatic ID Generation**: Unique experiment identifier | |
### Log Metrics Tab | |
- **Experiment ID**: The experiment to log metrics for | |
- **Metrics JSON**: Training metrics in JSON format | |
- **Step**: Current training step (optional) | |
Example metrics JSON: | |
```json | |
{ | |
"loss": 0.5234, | |
"accuracy": 0.8567, | |
"learning_rate": 3.5e-6, | |
"gpu_memory_gb": 22.5, | |
"gpu_utilization_percent": 87.3, | |
"training_time_per_step": 0.456 | |
} | |
``` | |
### Log Parameters Tab | |
- **Experiment ID**: The experiment to log parameters for | |
- **Parameters JSON**: Training configuration in JSON format | |
Example parameters JSON: | |
```json | |
{ | |
"model_name": "HuggingFaceTB/SmolLM3-3B", | |
"batch_size": 8, | |
"learning_rate": 3.5e-6, | |
"max_iters": 18000, | |
"mixed_precision": "bf16", | |
"no_think_system_message": true | |
} | |
``` | |
### View Experiments Tab | |
- **Experiment ID**: Enter to view specific experiment | |
- **List All Experiments**: Shows overview of all experiments | |
- **Detailed Information**: Formatted display with statistics | |
### π Visualizations Tab | |
- **Training Metrics**: Interactive plots for individual metrics | |
- **Experiment Comparison**: Side-by-side comparison of multiple runs | |
- **Real-time Updates**: Plots update as new data is logged | |
### π― Demo Data Tab | |
- **Generate Demo Data**: Create realistic training data for testing | |
- **Configurable**: Adjust parameters to match your setup | |
- **Multiple Metrics**: Simulates loss, accuracy, GPU metrics, etc. | |
### Update Status Tab | |
- **Experiment ID**: The experiment to update | |
- **Status**: running, completed, failed, paused | |
- **Visual Indicators**: Status shown with emojis | |
## π What Gets Displayed | |
### Training Metrics | |
- **Loss**: Training loss over time | |
- **Accuracy**: Model accuracy progression | |
- **Learning Rate**: Learning rate scheduling | |
- **GPU Memory**: Memory usage in GB | |
- **GPU Utilization**: GPU usage percentage | |
- **Training Time**: Time per training step | |
### Experiment Details | |
- **Basic Info**: ID, name, description, status, creation time | |
- **Statistics**: Metrics count, parameters count, artifacts count | |
- **Parameters**: All training configuration | |
- **Latest Metrics**: Most recent training metrics | |
### Visualizations | |
- **Line Charts**: Smooth curves showing metric progression | |
- **Interactive Hover**: Detailed information on hover | |
- **Multiple Metrics**: Switch between different metrics | |
- **Comparison Charts**: Side-by-side experiment comparison | |
## π§ Integration with Your Training | |
### Automatic Integration | |
Your training script automatically: | |
1. **Creates experiments** with your specified name | |
2. **Logs parameters** from your configuration | |
3. **Logs metrics** every 25 steps (configurable) | |
4. **Logs system metrics** (GPU memory, utilization) | |
5. **Logs checkpoints** every 2000 steps | |
6. **Updates status** when training completes | |
### Manual Integration | |
You can also manually: | |
1. **Create experiments** through the interface | |
2. **Log custom metrics** for specific analysis | |
3. **Compare different runs** with different parameters | |
4. **Generate demo data** for testing the interface | |
## π¨ Customization | |
### Adding Custom Metrics | |
```python | |
# In your training script | |
custom_metrics = { | |
"loss": current_loss, | |
"accuracy": current_accuracy, | |
"custom_metric": your_custom_value, | |
"gpu_memory": gpu_memory_usage | |
} | |
monitor.log_metrics(custom_metrics, step=current_step) | |
``` | |
### Custom Visualizations | |
The interface supports any metric you log. Just add it to your metrics JSON and it will appear in the visualization dropdown. | |
## π¨ Troubleshooting | |
### No Data Displayed | |
1. **Check experiment ID**: Make sure you're using the correct ID | |
2. **Verify metrics were logged**: Check if training is actually logging metrics | |
3. **Use demo data**: Generate demo data to test the interface | |
### Plots Not Updating | |
1. **Refresh the page**: Sometimes plots need a refresh | |
2. **Check data format**: Ensure metrics are in the correct JSON format | |
3. **Verify step numbers**: Make sure step numbers are increasing | |
### Interface Not Loading | |
1. **Check dependencies**: Ensure plotly and pandas are installed | |
2. **Check Gradio version**: Use Gradio 4.0.0 or higher | |
3. **Check browser console**: Look for JavaScript errors | |
## π Example Workflow | |
1. **Start Training**: | |
```bash | |
python run_a100_large_experiment.py --experiment-name "my_experiment" | |
``` | |
2. **Monitor Progress**: | |
- Visit your Trackio Space | |
- Go to "View Experiments" | |
- Enter your experiment ID | |
- Watch real-time updates | |
3. **Visualize Results**: | |
- Go to "π Visualizations" | |
- Select "loss" metric | |
- Create plot to see training progress | |
4. **Compare Runs**: | |
- Run multiple experiments with different parameters | |
- Use "Compare Experiments" to see differences | |
5. **Generate Demo Data**: | |
- Use "π― Demo Data" tab to test the interface | |
- Generate realistic training data for demonstration | |
## π Success Indicators | |
Your interface is working correctly when you see: | |
- β **Formatted experiment details** with emojis and structure | |
- β **Interactive plots** that respond to your inputs | |
- β **Real-time metric updates** during training | |
- β **Clean experiment overview** with statistics | |
- β **Smooth visualization** with hover information | |
The enhanced interface will now display much more meaningful information and provide a comprehensive monitoring experience for your SmolLM3 training experiments! |