File size: 7,359 Bytes
ebe598e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# πŸš€ Trackio with Hugging Face Datasets - Complete Guide

## Overview

This guide explains how to use Hugging Face Datasets for persistent storage of Trackio experiments, providing reliable data persistence across Hugging Face Spaces deployments.

## πŸ—οΈ Architecture

### Why HF Datasets?

1. **Persistent Storage**: Data survives Space restarts and redeployments
2. **Version Control**: Automatic versioning of experiment data
3. **Access Control**: Private datasets for security
4. **Reliability**: HF's infrastructure ensures data availability
5. **Scalability**: Handles large amounts of experiment data

### Data Flow

```
Training Script β†’ Trackio App β†’ HF Dataset β†’ Trackio App β†’ Plots
```

## πŸš€ Setup Instructions

### 1. Create HF Token

1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens)
2. Create a new token with `write` permissions
3. Copy the token for use in your Space

### 2. Set Up Dataset Repository

```bash
# Run the setup script
python setup_hf_dataset.py
```

This will:
- Create a private dataset: `tonic/trackio-experiments`
- Add your existing experiments
- Configure the dataset for Trackio

### 3. Configure Hugging Face Space

#### Environment Variables
Set these in your HF Space settings:
```bash
HF_TOKEN=your_hf_token_here
TRACKIO_DATASET_REPO=your-username/your-dataset-name
```

**Environment Variables Explained:**
- `HF_TOKEN`: Your Hugging Face token (required for dataset access)
- `TRACKIO_DATASET_REPO`: Dataset repository to use (optional, defaults to `tonic/trackio-experiments`)

**Example Configurations:**
```bash
# Use default dataset
HF_TOKEN=your_token_here

# Use personal dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-username/trackio-experiments

# Use team dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-org/team-experiments

# Use project-specific dataset
HF_TOKEN=your_token_here
TRACKIO_DATASET_REPO=your-username/smollm3-experiments
```

#### Requirements
Update your `requirements.txt`:
```txt
gradio>=4.0.0
plotly>=5.0.0
pandas>=1.5.0
numpy>=1.24.0
datasets>=2.14.0
huggingface-hub>=0.16.0
requests>=2.31.0
```

### 4. Deploy Updated App

The updated `app.py` now:
- Loads experiments from HF Dataset
- Saves new experiments to the dataset
- Falls back to backup data if dataset unavailable
- Provides better error handling

### 5. Configure Environment Variables

Use the configuration script to check your setup:

```bash
python configure_trackio.py
```

This script will:
- Show current environment variables
- Test dataset access
- Generate configuration file
- Provide usage examples

**Available Environment Variables:**

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `HF_TOKEN` | Yes | None | Your Hugging Face token |
| `TRACKIO_DATASET_REPO` | No | `tonic/trackio-experiments` | Dataset repository to use |
| `SPACE_ID` | Auto | None | HF Space ID (auto-detected) |

## πŸ“Š Dataset Schema

The HF Dataset contains these columns:

| Column | Type | Description |
|--------|------|-------------|
| `experiment_id` | string | Unique experiment identifier |
| `name` | string | Experiment name |
| `description` | string | Experiment description |
| `created_at` | string | ISO timestamp |
| `status` | string | running/completed/failed |
| `metrics` | string | JSON array of metric entries |
| `parameters` | string | JSON object of experiment parameters |
| `artifacts` | string | JSON array of artifacts |
| `logs` | string | JSON array of log entries |
| `last_updated` | string | ISO timestamp of last update |

## πŸ”§ Technical Details

### Loading Experiments

```python
from datasets import load_dataset

# Load from HF Dataset
dataset = load_dataset("tonic/trackio-experiments", token=HF_TOKEN)

# Convert to experiments dict
for row in dataset['train']:
    experiment = {
        'id': row['experiment_id'],
        'metrics': json.loads(row['metrics']),
        'parameters': json.loads(row['parameters']),
        # ... other fields
    }
```

### Saving Experiments

```python
from datasets import Dataset
from huggingface_hub import HfApi

# Convert experiments to dataset format
dataset_data = []
for exp_id, exp_data in experiments.items():
    dataset_data.append({
        'experiment_id': exp_id,
        'metrics': json.dumps(exp_data['metrics']),
        'parameters': json.dumps(exp_data['parameters']),
        # ... other fields
    })

# Push to HF Hub
dataset = Dataset.from_list(dataset_data)
dataset.push_to_hub("tonic/trackio-experiments", token=HF_TOKEN, private=True)
```

## πŸ“ˆ Your Current Experiments

### Available Experiments

1. **`exp_20250720_130853`** (petite-elle-l-aime-3)
   - 4 metric entries (steps 25, 50, 75, 100)
   - Loss decreasing: 1.1659 β†’ 1.1528
   - Good convergence pattern

2. **`exp_20250720_134319`** (petite-elle-l-aime-3-1)
   - 2 metric entries (step 25)
   - Loss: 1.166
   - GPU memory tracking

### Metrics Available for Plotting

- `loss` - Training loss curve
- `learning_rate` - Learning rate schedule
- `mean_token_accuracy` - Token-level accuracy
- `grad_norm` - Gradient norm
- `num_tokens` - Tokens processed
- `epoch` - Training epoch
- `gpu_0_memory_allocated` - GPU memory usage
- `cpu_percent` - CPU usage
- `memory_percent` - System memory

## 🎯 Usage Instructions

### 1. View Experiments
- Go to "View Experiments" tab
- Enter experiment ID: `exp_20250720_130853` or `exp_20250720_134319`
- Click "View Experiment"

### 2. Create Plots
- Go to "Visualizations" tab
- Enter experiment ID
- Select metric to plot
- Click "Create Plot"

### 3. Compare Experiments
- Use "Experiment Comparison" feature
- Enter: `exp_20250720_130853,exp_20250720_134319`
- Compare loss curves

## πŸ” Troubleshooting

### Issue: "No metrics data available"
**Solutions**:
1. Check HF_TOKEN is set correctly
2. Verify dataset repository exists
3. Check network connectivity to HF Hub

### Issue: "Failed to load from dataset"
**Solutions**:
1. App falls back to backup data automatically
2. Check dataset permissions
3. Verify token has read access

### Issue: "Failed to save experiments"
**Solutions**:
1. Check token has write permissions
2. Verify dataset repository exists
3. Check network connectivity

## πŸš€ Benefits of This Approach

### βœ… Advantages
- **Persistent**: Data survives Space restarts
- **Reliable**: HF's infrastructure ensures availability
- **Secure**: Private datasets protect your data
- **Scalable**: Handles large amounts of experiment data
- **Versioned**: Automatic versioning of experiment data

### πŸ”„ Fallback Strategy
1. **Primary**: Load from HF Dataset
2. **Secondary**: Use backup data (your existing experiments)
3. **Tertiary**: Create new experiments locally

## πŸ“‹ Next Steps

1. **Set HF_TOKEN**: Add your token to Space environment
2. **Run Setup**: Execute `setup_hf_dataset.py`
3. **Deploy App**: Push updated `app.py` to your Space
4. **Test Plots**: Verify experiments load and plots work
5. **Monitor Training**: New experiments will be saved to dataset

## πŸ” Security Notes

- Dataset is **private** by default
- Only accessible with your HF_TOKEN
- Experiment data is stored securely on HF infrastructure
- No sensitive data is exposed publicly

---

**Your experiments are now configured for reliable persistence using Hugging Face Datasets!** πŸŽ‰