SmolFactory / docs /TRACKIO_DEPLOYMENT_FIXES.md
Tonic's picture
adds sft , quantization, better readmes
40fd629 verified
|
raw
history blame
9.1 kB

Trackio Deployment Fixes

This document outlines the fixes made to resolve the Trackio Space deployment and dataset creation issues.

Issues Identified

1. Git Authentication Issues in Space Deployment

  • Problem: The deploy_trackio_space.py script was using git commands for file upload, which failed with authentication errors
  • Solution: Replaced git commands with direct HF Hub API calls using upload_file()

2. Dataset Repository Creation Issues

  • Problem: The setup_hf_dataset.py script was trying to push to a dataset repository that didn't exist, causing 404 errors
  • Solution: Added proper repository creation using create_repo() before pushing the dataset

3. Missing Environment Variable Setup

  • Problem: The Space deployment didn't set up the required HF_TOKEN environment variable
  • Solution: Added automatic secret setting using add_space_secret() API method

4. Manual Username Input Required

  • Problem: Users had to manually enter their username
  • Solution: Automatically extract username from token using whoami() API method

5. Dataset Access Testing Issues

  • Problem: The configuration script failed when testing dataset access for non-existent datasets
  • Solution: Added proper error handling and repository existence checks

Fixed Scripts

1. scripts/trackio_tonic/deploy_trackio_space.py

Key Changes:

  • Replaced git upload with HF Hub API: Now uses upload_file() directly instead of git commands
  • Automatic secret setting: Uses add_space_secret() API to set HF_TOKEN automatically
  • Username extraction from token: Uses whoami() to get username automatically
  • Removed manual username input: No longer asks for username
  • Improved error handling: Better error messages and fallback options

Usage:

python scripts/trackio_tonic/deploy_trackio_space.py

What it does:

  1. Extracts username from HF token automatically
  2. Creates a new HF Space using the API
  3. Prepares Space files from templates
  4. Uploads files using HF Hub API (no git required)
  5. Automatically sets secrets via API (HF_TOKEN and TRACKIO_DATASET_REPO)
  6. Tests the Space accessibility

2. scripts/dataset_tonic/setup_hf_dataset.py

Key Changes:

  • Added repository creation: Creates the dataset repository before pushing data
  • Username extraction from token: Uses whoami() to get username automatically
  • Automatic dataset naming: Uses username in dataset repository name
  • Improved error handling: Better error messages for common issues
  • Public datasets by default: Makes datasets public for easier access

Usage:

python scripts/dataset_tonic/setup_hf_dataset.py

What it does:

  1. Extracts username from HF token automatically
  2. Creates the dataset repository if it doesn't exist
  3. Creates a dataset with sample experiment data
  4. Uploads README template
  5. Makes the dataset public for easier access

3. scripts/trackio_tonic/configure_trackio.py

Key Changes:

  • Added repository existence check: Checks if dataset repository exists before trying to load
  • Username extraction from token: Uses whoami() to get username automatically
  • Automatic dataset naming: Uses username in default dataset repository
  • Better error handling: Distinguishes between missing repository and permission issues
  • Improved user guidance: Clear instructions for next steps

Usage:

python scripts/trackio_tonic/configure_trackio.py

What it does:

  1. Extracts username from HF token automatically
  2. Validates current configuration
  3. Tests dataset access with proper error handling
  4. Generates configuration file with username
  5. Provides usage examples with actual username

Model Push Script (scripts/model_tonic/push_to_huggingface.py)

The model push script was already using the HF Hub API correctly, so no changes were needed. It properly:

  • Creates repositories using create_repo()
  • Uploads files using upload_file()
  • Handles authentication correctly

Environment Variables Required

For HF Spaces:

HF_TOKEN=your_hf_token_here
TRACKIO_DATASET_REPO=your-username/your-dataset-name

For Local Development:

export HF_TOKEN=your_hf_token_here
export TRACKIO_DATASET_REPO=your-username/your-dataset-name

Deployment Workflow

1. Create Dataset

# Set environment variables
export HF_TOKEN=your_token_here
# TRACKIO_DATASET_REPO will be auto-generated as username/trackio-experiments

# Create the dataset
python scripts/dataset_tonic/setup_hf_dataset.py

2. Deploy Trackio Space

# Deploy the Space (no username needed - extracted from token)
python scripts/trackio_tonic/deploy_trackio_space.py

3. Secrets are Automatically Set

The script now automatically sets the required secrets via the HF Hub API:

  • HF_TOKEN - Your Hugging Face token
  • TRACKIO_DATASET_REPO - Your dataset repository (if specified)

4. Test Configuration

# Test the configuration
python scripts/trackio_tonic/configure_trackio.py

New Features

βœ… Automatic Secret Setting

  • Uses add_space_secret() API method
  • Sets HF_TOKEN automatically
  • Sets TRACKIO_DATASET_REPO if specified
  • Falls back to manual instructions if API fails

βœ… Username Extraction from Token

  • Uses whoami() API method
  • No manual username input required
  • Automatically uses username in dataset names
  • Provides better user experience

βœ… Improved User Experience

  • Fewer manual inputs required
  • Automatic configuration based on token
  • Clear feedback about what's happening
  • Better error messages

Troubleshooting

Common Issues:

  1. "Repository not found" errors:

    • Run setup_hf_dataset.py to create the dataset first
    • Check that your HF token has write permissions
  2. "Authentication failed" errors:

  3. "Space not accessible" errors:

    • Wait 2-5 minutes for the Space to build
    • Check Space logs at the Space URL
    • Verify all files were uploaded correctly
  4. "Dataset access failed" errors:

    • Ensure the dataset repository exists
    • Check that your token has read permissions
    • Verify the dataset repository name is correct
  5. "Secret setting failed" errors:

    • The script will fall back to manual instructions
    • Follow the provided instructions to set secrets manually
    • Check that your token has write permissions to the Space

Debugging Steps:

  1. Check token permissions:

    hf whoami
    
  2. Test dataset access:

    from datasets import load_dataset
    dataset = load_dataset("your-username/your-dataset", token="your-token")
    
  3. Test Space deployment:

    python scripts/trackio_tonic/deploy_trackio_space.py
    
  4. Test secret setting:

    from huggingface_hub import HfApi
    api = HfApi(token="your-token")
    api.add_space_secret("your-username/your-space", "TEST_KEY", "test_value")
    

Security Considerations

  • Public datasets: Datasets are now public by default for easier access
  • Token security: Never commit tokens to version control
  • Space secrets: Automatically set via API, with manual fallback
  • Access control: Verify token permissions before deployment

Performance Improvements

  • Direct API calls: Eliminated git dependency for faster uploads
  • Automatic configuration: No manual username input required
  • Parallel processing: Files are uploaded individually for better error handling
  • Caching: HF Hub API handles caching automatically
  • Error recovery: Better error handling and retry logic

Future Enhancements

  1. Batch secret setting: Set multiple secrets in one API call
  2. Progress tracking: Add progress bars for large uploads
  3. Validation: Add more comprehensive validation checks
  4. Rollback: Add ability to rollback failed deployments
  5. Hardware configuration: Automatically configure Space hardware

Testing

To test the fixes:

# Test dataset creation
python scripts/dataset_tonic/setup_hf_dataset.py

# Test Space deployment
python scripts/trackio_tonic/deploy_trackio_space.py

# Test configuration
python scripts/trackio_tonic/configure_trackio.py

# Test model push (if you have a trained model)
python scripts/model_tonic/push_to_huggingface.py --model-path /path/to/model --repo-name your-username/your-model

Summary

These fixes resolve the main issues with:

  • βœ… Git authentication problems
  • βœ… Dataset repository creation failures
  • βœ… Missing environment variable setup
  • βœ… Manual username input requirement
  • βœ… Poor error handling and user feedback
  • βœ… Security concerns with public datasets

The scripts now use the HF Hub API directly, provide better error messages, handle edge cases properly, and offer a much improved user experience with automatic configuration.