Spaces:

Tonic
/

SmolFactory

Running

File size: 7,359 Bytes

ebe598e

# 🚀 Trackio with Hugging Face Datasets - Complete Guide

## Overview

This guide explains how to use Hugging Face Datasets for persistent storage of Trackio experiments, providing reliable data persistence across Hugging Face Spaces deployments.

## 🏗️ Architecture

### Why HF Datasets?

1. **Persistent Storage**: Data survives Space restarts and redeployments
2. **Version Control**: Automatic versioning of experiment data
3. **Access Control**: Private datasets for security
4. **Reliability**: HF's infrastructure ensures data availability
5. **Scalability**: Handles large amounts of experiment data

### Data Flow

```
Training Script → Trackio App → HF Dataset → Trackio App → Plots
```

## 🚀 Setup Instructions

### 1. Create HF Token

1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
2. Create a new token with `write` permissions
3. Copy the token for use in your Space

### 2. Set Up Dataset Repository

```bash
# Run the setup script
python setup_hf_dataset.py
```

This will:
- Create a private dataset: `tonic/trackio-experiments`
- Add your existing experiments
- Configure the dataset for Trackio

### 3. Configure Hugging Face Space

#### Environment Variables
Set these in your HF Space settings:
```bash
HF_TOKEN=your_hf_token_here
TRACKIO_DATASET_REPO=your-username/your-dataset-name
```

**Environment Variables Explained:**
- `HF_TOKEN`: Your Hugging Face token (required for dataset access)
- `TRACKIO_DATASET_REPO`: Dataset repository to use (optional, defaults to `tonic/trackio-experiments`)

**Example Configurations:**
```bash
# Use default dataset
HF_TOKEN=your_token_here

# Use personal dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-username/trackio-experiments

# Use team dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-org/team-experiments

# Use project-specific dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-username/smollm3-experiments
```

#### Requirements
Update your `requirements.txt`:
```txt
gradio>=4.0.0
plotly>=5.0.0
pandas>=1.5.0
numpy>=1.24.0
datasets>=2.14.0
huggingface-hub>=0.16.0
requests>=2.31.0
```

### 4. Deploy Updated App

The updated `app.py` now:
- Loads experiments from HF Dataset
- Saves new experiments to the dataset
- Falls back to backup data if dataset unavailable
- Provides better error handling

### 5. Configure Environment Variables

Use the configuration script to check your setup:

```bash
python configure_trackio.py
```

This script will:
- Show current environment variables
- Test dataset access
- Generate configuration file
- Provide usage examples

**Available Environment Variables:**

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `HF_TOKEN` | Yes | None | Your Hugging Face token |
| `TRACKIO_DATASET_REPO` | No | `tonic/trackio-experiments` | Dataset repository to use |
| `SPACE_ID` | Auto | None | HF Space ID (auto-detected) |

## 📊 Dataset Schema

The HF Dataset contains these columns:

| Column | Type | Description |
|--------|------|-------------|
| `experiment_id` | string | Unique experiment identifier |
| `name` | string | Experiment name |
| `description` | string | Experiment description |
| `created_at` | string | ISO timestamp |
| `status` | string | running/completed/failed |
| `metrics` | string | JSON array of metric entries |
| `parameters` | string | JSON object of experiment parameters |
| `artifacts` | string | JSON array of artifacts |
| `logs` | string | JSON array of log entries |
| `last_updated` | string | ISO timestamp of last update |

## 🔧 Technical Details

### Loading Experiments

```python
from datasets import load_dataset

# Load from HF Dataset
dataset = load_dataset("tonic/trackio-experiments", token=HF_TOKEN)

# Convert to experiments dict
for row in dataset['train']:
    experiment = {
        'id': row['experiment_id'],
        'metrics': json.loads(row['metrics']),
        'parameters': json.loads(row['parameters']),
        # ... other fields
    }
```

### Saving Experiments

```python
from datasets import Dataset
from huggingface_hub import HfApi

# Convert experiments to dataset format
dataset_data = []
for exp_id, exp_data in experiments.items():
    dataset_data.append({
        'experiment_id': exp_id,
        'metrics': json.dumps(exp_data['metrics']),
        'parameters': json.dumps(exp_data['parameters']),
        # ... other fields
    })

# Push to HF Hub
dataset = Dataset.from_list(dataset_data)
dataset.push_to_hub("tonic/trackio-experiments", token=HF_TOKEN, private=True)
```

## 📈 Your Current Experiments

### Available Experiments

1. **`exp_20250720_130853`** (petite-elle-l-aime-3)
   - 4 metric entries (steps 25, 50, 75, 100)
   - Loss decreasing: 1.1659 → 1.1528
   - Good convergence pattern

2. **`exp_20250720_134319`** (petite-elle-l-aime-3-1)
   - 2 metric entries (step 25)
   - Loss: 1.166
   - GPU memory tracking

### Metrics Available for Plotting

- `loss` - Training loss curve
- `learning_rate` - Learning rate schedule
- `mean_token_accuracy` - Token-level accuracy
- `grad_norm` - Gradient norm
- `num_tokens` - Tokens processed
- `epoch` - Training epoch
- `gpu_0_memory_allocated` - GPU memory usage
- `cpu_percent` - CPU usage
- `memory_percent` - System memory

## 🎯 Usage Instructions

### 1. View Experiments
- Go to "View Experiments" tab
- Enter experiment ID: `exp_20250720_130853` or `exp_20250720_134319`
- Click "View Experiment"

### 2. Create Plots
- Go to "Visualizations" tab
- Enter experiment ID
- Select metric to plot
- Click "Create Plot"

### 3. Compare Experiments
- Use "Experiment Comparison" feature
- Enter: `exp_20250720_130853,exp_20250720_134319`
- Compare loss curves

## 🔍 Troubleshooting

### Issue: "No metrics data available"
**Solutions**:
1. Check HF_TOKEN is set correctly
2. Verify dataset repository exists
3. Check network connectivity to HF Hub

### Issue: "Failed to load from dataset"
**Solutions**:
1. App falls back to backup data automatically
2. Check dataset permissions
3. Verify token has read access

### Issue: "Failed to save experiments"
**Solutions**:
1. Check token has write permissions
2. Verify dataset repository exists
3. Check network connectivity

## 🚀 Benefits of This Approach

### ✅ Advantages
- **Persistent**: Data survives Space restarts
- **Reliable**: HF's infrastructure ensures availability
- **Secure**: Private datasets protect your data
- **Scalable**: Handles large amounts of experiment data
- **Versioned**: Automatic versioning of experiment data

### 🔄 Fallback Strategy
1. **Primary**: Load from HF Dataset
2. **Secondary**: Use backup data (your existing experiments)
3. **Tertiary**: Create new experiments locally

## 📋 Next Steps

1. **Set HF_TOKEN**: Add your token to Space environment
2. **Run Setup**: Execute `setup_hf_dataset.py`
3. **Deploy App**: Push updated `app.py` to your Space
4. **Test Plots**: Verify experiments load and plots work
5. **Monitor Training**: New experiments will be saved to dataset

## 🔐 Security Notes

- Dataset is **private** by default
- Only accessible with your HF_TOKEN
- Experiment data is stored securely on HF infrastructure
- No sensitive data is exposed publicly

---

**Your experiments are now configured for reliable persistence using Hugging Face Datasets!** 🎉