Spaces:
Running
Running
# π Trackio with Hugging Face Datasets - Complete Guide | |
## Overview | |
This guide explains how to use Hugging Face Datasets for persistent storage of Trackio experiments, providing reliable data persistence across Hugging Face Spaces deployments. | |
## ποΈ Architecture | |
### Why HF Datasets? | |
1. **Persistent Storage**: Data survives Space restarts and redeployments | |
2. **Version Control**: Automatic versioning of experiment data | |
3. **Access Control**: Private datasets for security | |
4. **Reliability**: HF's infrastructure ensures data availability | |
5. **Scalability**: Handles large amounts of experiment data | |
### Data Flow | |
``` | |
Training Script β Trackio App β HF Dataset β Trackio App β Plots | |
``` | |
## π Setup Instructions | |
### 1. Create HF Token | |
1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) | |
2. Create a new token with `write` permissions | |
3. Copy the token for use in your Space | |
### 2. Set Up Dataset Repository | |
```bash | |
# Run the setup script | |
python setup_hf_dataset.py | |
``` | |
This will: | |
- Create a private dataset: `tonic/trackio-experiments` | |
- Add your existing experiments | |
- Configure the dataset for Trackio | |
### 3. Configure Hugging Face Space | |
#### Environment Variables | |
Set these in your HF Space settings: | |
```bash | |
HF_TOKEN=your_hf_token_here | |
TRACKIO_DATASET_REPO=your-username/your-dataset-name | |
``` | |
**Environment Variables Explained:** | |
- `HF_TOKEN`: Your Hugging Face token (required for dataset access) | |
- `TRACKIO_DATASET_REPO`: Dataset repository to use (optional, defaults to `tonic/trackio-experiments`) | |
**Example Configurations:** | |
```bash | |
# Use default dataset | |
HF_TOKEN=your_token_here | |
# Use personal dataset | |
HF_TOKEN=your_token_here | |
TRACKIO_DATASET_REPO=your-username/trackio-experiments | |
# Use team dataset | |
HF_TOKEN=your_token_here | |
TRACKIO_DATASET_REPO=your-org/team-experiments | |
# Use project-specific dataset | |
HF_TOKEN=your_token_here | |
TRACKIO_DATASET_REPO=your-username/smollm3-experiments | |
``` | |
#### Requirements | |
Update your `requirements.txt`: | |
```txt | |
gradio>=4.0.0 | |
plotly>=5.0.0 | |
pandas>=1.5.0 | |
numpy>=1.24.0 | |
datasets>=2.14.0 | |
huggingface-hub>=0.16.0 | |
requests>=2.31.0 | |
``` | |
### 4. Deploy Updated App | |
The updated `app.py` now: | |
- Loads experiments from HF Dataset | |
- Saves new experiments to the dataset | |
- Falls back to backup data if dataset unavailable | |
- Provides better error handling | |
### 5. Configure Environment Variables | |
Use the configuration script to check your setup: | |
```bash | |
python configure_trackio.py | |
``` | |
This script will: | |
- Show current environment variables | |
- Test dataset access | |
- Generate configuration file | |
- Provide usage examples | |
**Available Environment Variables:** | |
| Variable | Required | Default | Description | | |
|----------|----------|---------|-------------| | |
| `HF_TOKEN` | Yes | None | Your Hugging Face token | | |
| `TRACKIO_DATASET_REPO` | No | `tonic/trackio-experiments` | Dataset repository to use | | |
| `SPACE_ID` | Auto | None | HF Space ID (auto-detected) | | |
## π Dataset Schema | |
The HF Dataset contains these columns: | |
| Column | Type | Description | | |
|--------|------|-------------| | |
| `experiment_id` | string | Unique experiment identifier | | |
| `name` | string | Experiment name | | |
| `description` | string | Experiment description | | |
| `created_at` | string | ISO timestamp | | |
| `status` | string | running/completed/failed | | |
| `metrics` | string | JSON array of metric entries | | |
| `parameters` | string | JSON object of experiment parameters | | |
| `artifacts` | string | JSON array of artifacts | | |
| `logs` | string | JSON array of log entries | | |
| `last_updated` | string | ISO timestamp of last update | | |
## π§ Technical Details | |
### Loading Experiments | |
```python | |
from datasets import load_dataset | |
# Load from HF Dataset | |
dataset = load_dataset("tonic/trackio-experiments", token=HF_TOKEN) | |
# Convert to experiments dict | |
for row in dataset['train']: | |
experiment = { | |
'id': row['experiment_id'], | |
'metrics': json.loads(row['metrics']), | |
'parameters': json.loads(row['parameters']), | |
# ... other fields | |
} | |
``` | |
### Saving Experiments | |
```python | |
from datasets import Dataset | |
from huggingface_hub import HfApi | |
# Convert experiments to dataset format | |
dataset_data = [] | |
for exp_id, exp_data in experiments.items(): | |
dataset_data.append({ | |
'experiment_id': exp_id, | |
'metrics': json.dumps(exp_data['metrics']), | |
'parameters': json.dumps(exp_data['parameters']), | |
# ... other fields | |
}) | |
# Push to HF Hub | |
dataset = Dataset.from_list(dataset_data) | |
dataset.push_to_hub("tonic/trackio-experiments", token=HF_TOKEN, private=True) | |
``` | |
## π Your Current Experiments | |
### Available Experiments | |
1. **`exp_20250720_130853`** (petite-elle-l-aime-3) | |
- 4 metric entries (steps 25, 50, 75, 100) | |
- Loss decreasing: 1.1659 β 1.1528 | |
- Good convergence pattern | |
2. **`exp_20250720_134319`** (petite-elle-l-aime-3-1) | |
- 2 metric entries (step 25) | |
- Loss: 1.166 | |
- GPU memory tracking | |
### Metrics Available for Plotting | |
- `loss` - Training loss curve | |
- `learning_rate` - Learning rate schedule | |
- `mean_token_accuracy` - Token-level accuracy | |
- `grad_norm` - Gradient norm | |
- `num_tokens` - Tokens processed | |
- `epoch` - Training epoch | |
- `gpu_0_memory_allocated` - GPU memory usage | |
- `cpu_percent` - CPU usage | |
- `memory_percent` - System memory | |
## π― Usage Instructions | |
### 1. View Experiments | |
- Go to "View Experiments" tab | |
- Enter experiment ID: `exp_20250720_130853` or `exp_20250720_134319` | |
- Click "View Experiment" | |
### 2. Create Plots | |
- Go to "Visualizations" tab | |
- Enter experiment ID | |
- Select metric to plot | |
- Click "Create Plot" | |
### 3. Compare Experiments | |
- Use "Experiment Comparison" feature | |
- Enter: `exp_20250720_130853,exp_20250720_134319` | |
- Compare loss curves | |
## π Troubleshooting | |
### Issue: "No metrics data available" | |
**Solutions**: | |
1. Check HF_TOKEN is set correctly | |
2. Verify dataset repository exists | |
3. Check network connectivity to HF Hub | |
### Issue: "Failed to load from dataset" | |
**Solutions**: | |
1. App falls back to backup data automatically | |
2. Check dataset permissions | |
3. Verify token has read access | |
### Issue: "Failed to save experiments" | |
**Solutions**: | |
1. Check token has write permissions | |
2. Verify dataset repository exists | |
3. Check network connectivity | |
## π Benefits of This Approach | |
### β Advantages | |
- **Persistent**: Data survives Space restarts | |
- **Reliable**: HF's infrastructure ensures availability | |
- **Secure**: Private datasets protect your data | |
- **Scalable**: Handles large amounts of experiment data | |
- **Versioned**: Automatic versioning of experiment data | |
### π Fallback Strategy | |
1. **Primary**: Load from HF Dataset | |
2. **Secondary**: Use backup data (your existing experiments) | |
3. **Tertiary**: Create new experiments locally | |
## π Next Steps | |
1. **Set HF_TOKEN**: Add your token to Space environment | |
2. **Run Setup**: Execute `setup_hf_dataset.py` | |
3. **Deploy App**: Push updated `app.py` to your Space | |
4. **Test Plots**: Verify experiments load and plots work | |
5. **Monitor Training**: New experiments will be saved to dataset | |
## π Security Notes | |
- Dataset is **private** by default | |
- Only accessible with your HF_TOKEN | |
- Experiment data is stored securely on HF infrastructure | |
- No sensitive data is exposed publicly | |
--- | |
**Your experiments are now configured for reliable persistence using Hugging Face Datasets!** π |