Spaces:
Running
Running
# Trackio Deployment Guide for Hugging Face Spaces | |
This guide provides step-by-step instructions for deploying Trackio experiment tracking to Hugging Face Spaces and integrating it with your SmolLM3 fine-tuning pipeline. | |
## Prerequisites | |
- Hugging Face account | |
- Hugging Face CLI installed (`pip install huggingface_hub`) | |
- Git configured with your Hugging Face credentials | |
## Method 1: Automated Deployment (Recommended) | |
### Step 1: Run the Deployment Script | |
```bash | |
python deploy_trackio_space.py | |
``` | |
The script will prompt you for: | |
- Your Hugging Face username | |
- Space name (e.g., `trackio-monitoring`) | |
- Hugging Face token (needs a write token obviously) | |
### Step 2: Wait for Build | |
After deployment, wait 2-5 minutes for the Space to build and become available. | |
### Step 3: Test the Interface | |
Visit your Space URL to test the interface: | |
``` | |
https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME | |
``` | |
## Method 2: Manual Deployment | |
### Step 1: Create a New Space | |
1. Go to https://huggingface.co/spaces | |
2. Click "Create new Space" | |
3. Configure the Space: | |
- **Owner**: Your username | |
- **Space name**: `trackio-monitoring` (or your preferred name) | |
- **SDK**: Gradio | |
- **Hardware**: CPU (Basic) | |
- **License**: MIT | |
### Step 2: Upload Files | |
Upload these files to your Space: | |
#### `app.py` | |
The main Gradio interface (already created in this repository) | |
#### `requirements_space.txt` | |
``` | |
gradio>=4.0.0 | |
gradio-client>=0.10.0 | |
requests>=2.31.0 | |
numpy>=1.24.0 | |
pandas>=2.0.0 | |
jsonschema>=4.17.0 | |
plotly>=5.15.0 | |
matplotlib>=3.7.0 | |
python-dotenv>=1.0.0 | |
``` | |
#### `README.md` | |
```markdown | |
# Trackio Experiment Tracking | |
A Gradio interface for experiment tracking and monitoring. | |
## Features | |
- Create and manage experiments | |
- Log training metrics and parameters | |
- View experiment details and results | |
- Update experiment status | |
## Usage | |
1. Create a new experiment using the "Create Experiment" tab | |
2. Log metrics during training using the "Log Metrics" tab | |
3. View experiment details using the "View Experiments" tab | |
4. Update experiment status using the "Update Status" tab | |
## Integration | |
To connect your training script to this Trackio Space: | |
```python | |
from monitoring import SmolLM3Monitor | |
monitor = SmolLM3Monitor( | |
experiment_name="my_experiment", | |
trackio_url="https://your-space.hf.space", | |
enable_tracking=True | |
) | |
``` | |
### Step 3: Configure Space Settings | |
In your Space settings, ensure: | |
- **App file**: `app.py` | |
- **Python version**: 3.9 or higher | |
- **Hardware**: CPU (Basic) is sufficient | |
## Integration with Your Training Script | |
### Step 1: Update Your Configuration | |
Add Trackio settings to your training configuration: | |
```python | |
# config/train_smollm3.py | |
@dataclass | |
class SmolLM3Config: | |
# ... existing settings ... | |
# Trackio monitoring configuration | |
enable_tracking: bool = True | |
trackio_url: Optional[str] = None # Your Space URL | |
trackio_token: Optional[str] = None | |
log_artifacts: bool = True | |
log_metrics: bool = True | |
log_config: bool = True | |
experiment_name: Optional[str] = None | |
``` | |
### Step 2: Run Training with Trackio | |
```bash | |
python train.py config/train_smollm3.py \ | |
--dataset_dir my_dataset \ | |
--enable_tracking \ | |
--trackio_url "https://your-username-trackio-monitoring.hf.space" \ | |
--experiment_name "smollm3_finetune_v1" | |
``` | |
### Step 3: Monitor Your Experiments | |
1. **Create Experiment**: Use the "Create Experiment" tab in your Space | |
2. **Log Metrics**: Your training script will automatically log metrics | |
3. **View Results**: Use the "View Experiments" tab to see progress | |
4. **Update Status**: Mark experiments as completed when done | |
## Advanced Configuration | |
### Environment Variables | |
You can set Trackio configuration via environment variables: | |
```bash | |
export TRACKIO_URL="https://your-space.hf.space" | |
export TRACKIO_TOKEN="your_token_here" | |
``` | |
### Custom Experiment Names | |
```bash | |
python train.py config/train_smollm3.py \ | |
--experiment_name "smollm3_high_lr_experiment" \ | |
--trackio_url "https://your-space.hf.space" | |
``` | |
### Multiple Experiments | |
You can run multiple experiments and track them separately: | |
```bash | |
# Experiment 1 | |
python train.py config/train_smollm3.py \ | |
--experiment_name "smollm3_baseline" \ | |
--learning_rate 2e-5 | |
# Experiment 2 | |
python train.py config/train_smollm3.py \ | |
--experiment_name "smollm3_high_lr" \ | |
--learning_rate 5e-5 | |
``` | |
## Using the Trackio Interface | |
### Creating Experiments | |
1. Go to the "Create Experiment" tab | |
2. Enter experiment name (e.g., "smollm3_finetune_v1") | |
3. Add description (optional) | |
4. Click "Create Experiment" | |
5. Note the experiment ID for logging metrics | |
### Logging Metrics | |
1. Go to the "Log Metrics" tab | |
2. Enter your experiment ID | |
3. Add metrics in JSON format: | |
```json | |
{ | |
"loss": 0.5, | |
"accuracy": 0.85, | |
"learning_rate": 2e-5 | |
} | |
``` | |
4. Add step number (optional) | |
5. Click "Log Metrics" | |
### Viewing Experiments | |
1. Go to the "View Experiments" tab | |
2. Enter experiment ID to view specific experiment | |
3. Or click "List All Experiments" to see all experiments | |
### Updating Status | |
1. Go to the "Update Status" tab | |
2. Enter experiment ID | |
3. Select new status (running, completed, failed, paused) | |
4. Click "Update Status" | |
## Troubleshooting | |
### Common Issues | |
#### 1. Space Not Building | |
- Check that all required files are uploaded | |
- Verify `app.py` is the main file | |
- Check the Space logs for errors | |
#### 2. Connection Errors | |
- Verify your Space URL is correct | |
- Check that the Space is running (not paused) | |
- Ensure your training script can reach the Space URL | |
#### 3. Missing Metrics | |
- Check that `enable_tracking=True` in your config | |
- Verify the Trackio URL is correct | |
- Check training logs for monitoring errors | |
#### 4. Authentication Issues | |
- If using tokens, verify they're correct | |
- Check Hugging Face account permissions | |
- Ensure Space is public or you have access | |
### Debug Mode | |
Enable debug logging in your training script: | |
```python | |
import logging | |
logging.basicConfig(level=logging.DEBUG) | |
``` | |
### Manual Testing | |
Test the Trackio interface manually: | |
1. Create an experiment | |
2. Log some test metrics | |
3. View the experiment details | |
4. Update the status | |
## Security Considerations | |
### Public vs Private Spaces | |
- **Public Spaces**: Anyone can view and use the interface | |
- **Private Spaces**: Only you and collaborators can access | |
### Token Management | |
- Store tokens securely (environment variables) | |
- Don't commit tokens to version control | |
- Use Hugging Face's token management | |
### Data Privacy | |
- Trackio stores experiment data in the Space | |
- Consider data retention policies | |
- Be mindful of sensitive information in experiment names | |
## Performance Optimization | |
### Space Configuration | |
- Use CPU (Basic) for the interface (sufficient for tracking) | |
- Consider GPU only for actual training | |
- Monitor Space usage and limits | |
### Efficient Logging | |
- Log metrics at reasonable intervals (every 10-100 steps) | |
- Avoid logging too frequently to prevent rate limiting | |
- Use batch logging when possible | |
## Monitoring Best Practices | |
### Experiment Naming | |
Use descriptive names: | |
- `smollm3_baseline_v1` | |
- `smollm3_high_lr_experiment` | |
- `smollm3_dpo_training` | |
### Metric Logging | |
Log relevant metrics: | |
- Training loss | |
- Validation loss | |
- Learning rate | |
- GPU memory usage | |
- Training time | |
### Status Management | |
- Mark experiments as "running" when starting | |
- Update to "completed" when finished | |
- Mark as "failed" if errors occur | |
- Use "paused" for temporary stops | |
## Integration Examples | |
### Basic Integration | |
```python | |
from monitoring import SmolLM3Monitor | |
# Initialize monitor | |
monitor = SmolLM3Monitor( | |
experiment_name="my_experiment", | |
trackio_url="https://your-space.hf.space", | |
enable_tracking=True | |
) | |
# Log configuration | |
monitor.log_config(config_dict) | |
# Log metrics during training | |
monitor.log_metrics({"loss": 0.5}, step=100) | |
# Log final results | |
monitor.log_training_summary(final_results) | |
``` | |
### Advanced Integration | |
```python | |
# Custom monitoring setup | |
monitor = SmolLM3Monitor( | |
experiment_name="smollm3_advanced", | |
trackio_url="https://your-space.hf.space", | |
enable_tracking=True, | |
log_artifacts=True, | |
log_metrics=True, | |
log_config=True | |
) | |
# Log system metrics | |
monitor.log_system_metrics(step=current_step) | |
# Log model checkpoint | |
monitor.log_model_checkpoint("checkpoint-1000", step=1000) | |
# Log evaluation results | |
monitor.log_evaluation_results(eval_results, step=1000) | |
``` | |
## Support and Resources | |
### Documentation | |
- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces) | |
- [Gradio Documentation](https://gradio.app/docs/) | |
- [Trackio GitHub Repository](https://github.com/Josephrp/trackio) | |
### Community | |
- [Hugging Face Forums](https://discuss.huggingface.co/) | |
- [Gradio Discord](https://discord.gg/feTf9z3Z) | |
### Issues and Feedback | |
- Report issues on the project repository | |
- Provide feedback on the Trackio interface | |
- Suggest improvements for the monitoring system | |
## Conclusion | |
You now have a complete Trackio monitoring system deployed on Hugging Face Spaces! This setup provides: | |
- β Easy experiment tracking and monitoring | |
- β Real-time metric logging | |
- β Web-based interface for experiment management | |
- β Integration with your SmolLM3 fine-tuning pipeline | |
- β Scalable and accessible monitoring solution | |
Start tracking your experiments and gain insights into your model training process! |