Spaces:
Running
Running
Trackio Integration for SmolLM3 Fine-tuning
This document provides comprehensive information about the Trackio experiment tracking and monitoring integration for your SmolLM3 fine-tuning pipeline.
Features
- SmolLM3 Fine-tuning: Support for supervised fine-tuning and DPO training
- Trackio Integration: Complete experiment tracking and monitoring
- Hugging Face Spaces Deployment: Easy deployment of Trackio monitoring interface
- Comprehensive Logging: Metrics, parameters, artifacts, and system monitoring
- Flexible Configuration: Support for various training configurations
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Basic Training with Trackio
python train.py config/train_smollm3.py \
--dataset_dir my_dataset \
--enable_tracking \
--trackio_url "https://your-trackio-instance.com" \
--experiment_name "smollm3_finetune_v1"
3. Training with Custom Parameters
python train.py config/train_smollm3.py \
--dataset_dir my_dataset \
--batch_size 8 \
--learning_rate 1e-5 \
--max_iters 2000 \
--enable_tracking \
--trackio_url "https://your-trackio-instance.com" \
--experiment_name "smollm3_high_lr_experiment"
Trackio Integration
Configuration
Add Trackio settings to your configuration:
# In your config file
config = SmolLM3Config(
# ... other settings ...
# Trackio monitoring configuration
enable_tracking=True,
trackio_url="https://your-trackio-instance.com",
trackio_token="your_token_here", # Optional
log_artifacts=True,
log_metrics=True,
log_config=True,
experiment_name="my_experiment"
)
Environment Variables
You can also set Trackio configuration via environment variables:
export TRACKIO_URL="https://your-trackio-instance.com"
export TRACKIO_TOKEN="your_token_here"
What Gets Tracked
- Configuration: All training parameters and model settings
- Metrics: Loss, accuracy, learning rate, and custom metrics
- System Metrics: GPU memory, CPU usage, training time
- Artifacts: Model checkpoints, evaluation results
- Training Summary: Final results and experiment duration
Hugging Face Spaces Deployment
Deploy Trackio Monitoring Interface
Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose "Gradio" as the SDK
- Set visibility (Public or Private)
Upload the deployment files:
app.py
- The Gradio interfacerequirements_space.txt
- DependenciesREADME.md
- Documentation
Configure the Space:
- The Space will automatically install dependencies
- The Gradio interface will be available at your Space URL
Using the Trackio Space
- Create Experiments: Use the "Create Experiment" tab to start new experiments
- Log Metrics: Use the "Log Metrics" tab to track training progress
- View Results: Use the "View Experiments" tab to see experiment details
- Update Status: Use the "Update Status" tab to mark experiments as completed
Integration with Your Training
To connect your training script to the Trackio Space:
# In your training script
from monitoring import SmolLM3Monitor
# Initialize monitor
monitor = SmolLM3Monitor(
experiment_name="my_experiment",
trackio_url="https://your-space.hf.space", # Your Space URL
enable_tracking=True
)
# Log configuration
monitor.log_config(config_dict)
# Log metrics during training
monitor.log_metrics({"loss": 0.5, "accuracy": 0.85}, step=100)
# Log final results
monitor.log_training_summary(final_results)
Configuration Files
Main Configuration (config/train_smollm3.py
)
@dataclass
class SmolLM3Config:
# Model configuration
model_name: str = "HuggingFaceTB/SmolLM3-3B"
max_seq_length: int = 4096
# Training configuration
batch_size: int = 4
learning_rate: float = 2e-5
max_iters: int = 1000
# Trackio monitoring
enable_tracking: bool = True
trackio_url: Optional[str] = None
trackio_token: Optional[str] = None
experiment_name: Optional[str] = None
DPO Configuration (config/train_smollm3_dpo.py
)
@dataclass
class SmolLM3DPOConfig(SmolLM3Config):
# DPO-specific settings
beta: float = 0.1
max_prompt_length: int = 2048
# Trackio monitoring (inherited)
enable_tracking: bool = True
trackio_url: Optional[str] = None
Monitoring Features
Real-time Metrics
- Training loss and evaluation metrics
- Learning rate scheduling
- GPU memory and utilization
- Training time and progress
Artifact Tracking
- Model checkpoints at regular intervals
- Evaluation results and plots
- Configuration snapshots
- Training logs and summaries
Experiment Management
- Experiment naming and organization
- Status tracking (running, completed, failed)
- Parameter comparison across experiments
- Result visualization
Advanced Usage
Custom Metrics
# Log custom metrics
monitor.log_metrics({
"custom_metric": value,
"perplexity": perplexity_score,
"bleu_score": bleu_score
}, step=current_step)
System Monitoring
# Log system metrics
monitor.log_system_metrics(step=current_step)
Artifact Logging
# Log model checkpoint
monitor.log_model_checkpoint("checkpoint-1000", step=1000)
# Log evaluation results
monitor.log_evaluation_results(eval_results, step=1000)
Troubleshooting
Common Issues
- Trackio not available: Install with
pip install trackio
- Connection errors: Check your Trackio URL and token
- Missing metrics: Ensure monitoring is enabled in configuration
- Space deployment issues: Check Gradio version compatibility
Debug Mode
Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.