Spaces:
Running
Running
| # π Trackio with Hugging Face Datasets - Complete Guide | |
| ## Overview | |
| This guide explains how to use Hugging Face Datasets for persistent storage of Trackio experiments, providing reliable data persistence across Hugging Face Spaces deployments. | |
| ## ποΈ Architecture | |
| ### Why HF Datasets? | |
| 1. **Persistent Storage**: Data survives Space restarts and redeployments | |
| 2. **Version Control**: Automatic versioning of experiment data | |
| 3. **Access Control**: Private datasets for security | |
| 4. **Reliability**: HF's infrastructure ensures data availability | |
| 5. **Scalability**: Handles large amounts of experiment data | |
| ### Data Flow | |
| ``` | |
| Training Script β Trackio App β HF Dataset β Trackio App β Plots | |
| ``` | |
| ## π Setup Instructions | |
| ### 1. Create HF Token | |
| 1. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) | |
| 2. Create a new token with `write` permissions | |
| 3. Copy the token for use in your Space | |
| ### 2. Set Up Dataset Repository | |
| ```bash | |
| # Run the setup script | |
| python setup_hf_dataset.py | |
| ``` | |
| This will: | |
| - Create a private dataset: `tonic/trackio-experiments` | |
| - Add your existing experiments | |
| - Configure the dataset for Trackio | |
| ### 3. Configure Hugging Face Space | |
| #### Environment Variables | |
| Set these in your HF Space settings: | |
| ```bash | |
| HF_TOKEN=your_hf_token_here | |
| TRACKIO_DATASET_REPO=your-username/your-dataset-name | |
| ``` | |
| **Environment Variables Explained:** | |
| - `HF_TOKEN`: Your Hugging Face token (required for dataset access) | |
| - `TRACKIO_DATASET_REPO`: Dataset repository to use (optional, defaults to `tonic/trackio-experiments`) | |
| **Example Configurations:** | |
| ```bash | |
| # Use default dataset | |
| HF_TOKEN=your_token_here | |
| # Use personal dataset | |
| HF_TOKEN=your_token_here | |
| TRACKIO_DATASET_REPO=your-username/trackio-experiments | |
| # Use team dataset | |
| HF_TOKEN=your_token_here | |
| TRACKIO_DATASET_REPO=your-org/team-experiments | |
| # Use project-specific dataset | |
| HF_TOKEN=your_token_here | |
| TRACKIO_DATASET_REPO=your-username/smollm3-experiments | |
| ``` | |
| #### Requirements | |
| Update your `requirements.txt`: | |
| ```txt | |
| gradio>=4.0.0 | |
| plotly>=5.0.0 | |
| pandas>=1.5.0 | |
| numpy>=1.24.0 | |
| datasets>=2.14.0 | |
| huggingface-hub>=0.16.0 | |
| requests>=2.31.0 | |
| ``` | |
| ### 4. Deploy Updated App | |
| The updated `app.py` now: | |
| - Loads experiments from HF Dataset | |
| - Saves new experiments to the dataset | |
| - Falls back to backup data if dataset unavailable | |
| - Provides better error handling | |
| ### 5. Configure Environment Variables | |
| Use the configuration script to check your setup: | |
| ```bash | |
| python configure_trackio.py | |
| ``` | |
| This script will: | |
| - Show current environment variables | |
| - Test dataset access | |
| - Generate configuration file | |
| - Provide usage examples | |
| **Available Environment Variables:** | |
| | Variable | Required | Default | Description | | |
| |----------|----------|---------|-------------| | |
| | `HF_TOKEN` | Yes | None | Your Hugging Face token | | |
| | `TRACKIO_DATASET_REPO` | No | `tonic/trackio-experiments` | Dataset repository to use | | |
| | `SPACE_ID` | Auto | None | HF Space ID (auto-detected) | | |
| ## π Dataset Schema | |
| The HF Dataset contains these columns: | |
| | Column | Type | Description | | |
| |--------|------|-------------| | |
| | `experiment_id` | string | Unique experiment identifier | | |
| | `name` | string | Experiment name | | |
| | `description` | string | Experiment description | | |
| | `created_at` | string | ISO timestamp | | |
| | `status` | string | running/completed/failed | | |
| | `metrics` | string | JSON array of metric entries | | |
| | `parameters` | string | JSON object of experiment parameters | | |
| | `artifacts` | string | JSON array of artifacts | | |
| | `logs` | string | JSON array of log entries | | |
| | `last_updated` | string | ISO timestamp of last update | | |
| ## π§ Technical Details | |
| ### Loading Experiments | |
| ```python | |
| from datasets import load_dataset | |
| # Load from HF Dataset | |
| dataset = load_dataset("tonic/trackio-experiments", token=HF_TOKEN) | |
| # Convert to experiments dict | |
| for row in dataset['train']: | |
| experiment = { | |
| 'id': row['experiment_id'], | |
| 'metrics': json.loads(row['metrics']), | |
| 'parameters': json.loads(row['parameters']), | |
| # ... other fields | |
| } | |
| ``` | |
| ### Saving Experiments | |
| ```python | |
| from datasets import Dataset | |
| from huggingface_hub import HfApi | |
| # Convert experiments to dataset format | |
| dataset_data = [] | |
| for exp_id, exp_data in experiments.items(): | |
| dataset_data.append({ | |
| 'experiment_id': exp_id, | |
| 'metrics': json.dumps(exp_data['metrics']), | |
| 'parameters': json.dumps(exp_data['parameters']), | |
| # ... other fields | |
| }) | |
| # Push to HF Hub | |
| dataset = Dataset.from_list(dataset_data) | |
| dataset.push_to_hub("tonic/trackio-experiments", token=HF_TOKEN, private=True) | |
| ``` | |
| ## π Your Current Experiments | |
| ### Available Experiments | |
| 1. **`exp_20250720_130853`** (petite-elle-l-aime-3) | |
| - 4 metric entries (steps 25, 50, 75, 100) | |
| - Loss decreasing: 1.1659 β 1.1528 | |
| - Good convergence pattern | |
| 2. **`exp_20250720_134319`** (petite-elle-l-aime-3-1) | |
| - 2 metric entries (step 25) | |
| - Loss: 1.166 | |
| - GPU memory tracking | |
| ### Metrics Available for Plotting | |
| - `loss` - Training loss curve | |
| - `learning_rate` - Learning rate schedule | |
| - `mean_token_accuracy` - Token-level accuracy | |
| - `grad_norm` - Gradient norm | |
| - `num_tokens` - Tokens processed | |
| - `epoch` - Training epoch | |
| - `gpu_0_memory_allocated` - GPU memory usage | |
| - `cpu_percent` - CPU usage | |
| - `memory_percent` - System memory | |
| ## π― Usage Instructions | |
| ### 1. View Experiments | |
| - Go to "View Experiments" tab | |
| - Enter experiment ID: `exp_20250720_130853` or `exp_20250720_134319` | |
| - Click "View Experiment" | |
| ### 2. Create Plots | |
| - Go to "Visualizations" tab | |
| - Enter experiment ID | |
| - Select metric to plot | |
| - Click "Create Plot" | |
| ### 3. Compare Experiments | |
| - Use "Experiment Comparison" feature | |
| - Enter: `exp_20250720_130853,exp_20250720_134319` | |
| - Compare loss curves | |
| ## π Troubleshooting | |
| ### Issue: "No metrics data available" | |
| **Solutions**: | |
| 1. Check HF_TOKEN is set correctly | |
| 2. Verify dataset repository exists | |
| 3. Check network connectivity to HF Hub | |
| ### Issue: "Failed to load from dataset" | |
| **Solutions**: | |
| 1. App falls back to backup data automatically | |
| 2. Check dataset permissions | |
| 3. Verify token has read access | |
| ### Issue: "Failed to save experiments" | |
| **Solutions**: | |
| 1. Check token has write permissions | |
| 2. Verify dataset repository exists | |
| 3. Check network connectivity | |
| ## π Benefits of This Approach | |
| ### β Advantages | |
| - **Persistent**: Data survives Space restarts | |
| - **Reliable**: HF's infrastructure ensures availability | |
| - **Secure**: Private datasets protect your data | |
| - **Scalable**: Handles large amounts of experiment data | |
| - **Versioned**: Automatic versioning of experiment data | |
| ### π Fallback Strategy | |
| 1. **Primary**: Load from HF Dataset | |
| 2. **Secondary**: Use backup data (your existing experiments) | |
| 3. **Tertiary**: Create new experiments locally | |
| ## π Next Steps | |
| 1. **Set HF_TOKEN**: Add your token to Space environment | |
| 2. **Run Setup**: Execute `setup_hf_dataset.py` | |
| 3. **Deploy App**: Push updated `app.py` to your Space | |
| 4. **Test Plots**: Verify experiments load and plots work | |
| 5. **Monitor Training**: New experiments will be saved to dataset | |
| ## π Security Notes | |
| - Dataset is **private** by default | |
| - Only accessible with your HF_TOKEN | |
| - Experiment data is stored securely on HF infrastructure | |
| - No sensitive data is exposed publicly | |
| --- | |
| **Your experiments are now configured for reliable persistence using Hugging Face Datasets!** π |