SmolFactory / docs /TRACKIO_API_FIX_SUMMARY.md
Tonic's picture
use gradio for better connection with the spaces
f251d3d verified
|
raw
history blame
7.95 kB

Trackio API Fix Summary

Overview

This document summarizes the fixes applied to resolve the 404 errors in the Trackio integration and implement automatic Space URL resolution.

Issues Identified

1. 404 Errors in Trackio API Calls

  • Problem: The original API client was using incorrect endpoints and HTTP request patterns
  • Error: POST request failed: 404 - Cannot POST /spaces/Tonic/trackio-monitoring-20250727/gradio_api/call/list_experiments_interface
  • Root Cause: Using raw HTTP requests instead of the proper Gradio client API

2. Hardcoded Space URL

  • Problem: The Space URL was hardcoded, making it inflexible
  • Issue: No automatic resolution of Space URLs from Space IDs
  • Impact: Required manual URL updates when Space deployment changes

Solutions Implemented

1. Updated API Client to Use Gradio Client

File: scripts/trackio_tonic/trackio_api_client.py

Changes:

  • Replaced custom HTTP requests with gradio_client.Client
  • Uses proper two-step process (POST to get event_id, then GET to get results)
  • Handles all Gradio API endpoints correctly

Before:

# Custom HTTP requests with manual event_id handling
response = requests.post(url, json=payload)
event_id = response.json()["event_id"]
result = requests.get(f"{url}/{event_id}")

After:

# Using gradio_client for proper API communication
result = self.client.predict(*args, api_name=api_name)

2. Automatic Space URL Resolution

Implementation:

  • Uses Hugging Face Hub API to resolve Space URLs from Space IDs
  • Falls back to default URL format if API is unavailable
  • Supports both authenticated and anonymous access

Key Features:

def _resolve_space_url(self) -> Optional[str]:
    """Resolve Space URL using Hugging Face Hub API"""
    api = HfApi(token=self.hf_token)
    space_info = api.space_info(self.space_id)
    if space_info and hasattr(space_info, 'host'):
        return space_info.host
    else:
        # Fallback to default URL format
        space_name = self.space_id.replace('/', '-')
        return f"https://{space_name}.hf.space"

3. Updated Client Interface

Before:

client = TrackioAPIClient("https://tonic-trackio-monitoring-20250727.hf.space")

After:

client = TrackioAPIClient("Tonic/trackio-monitoring-20250727", hf_token)

4. Enhanced Monitoring Integration

File: src/monitoring.py

Changes:

  • Updated to use Space ID instead of hardcoded URL
  • Automatic experiment creation with proper ID extraction
  • Better error handling and fallback mechanisms

Dependencies Added

Required Packages

pip install gradio_client huggingface_hub

Package Versions

  • gradio_client>=1.10.4 - For proper Gradio API communication
  • huggingface_hub>=0.19.3 - For Space URL resolution

API Endpoints Supported

The updated client supports all documented Gradio endpoints:

  1. Experiment Management:

    • /create_experiment_interface - Create new experiments
    • /list_experiments_interface - List all experiments
    • /get_experiment_details - Get experiment details
    • /update_experiment_status_interface - Update experiment status
  2. Metrics and Parameters:

    • /log_metrics_interface - Log training metrics
    • /log_parameters_interface - Log experiment parameters
  3. Visualization:

    • /create_metrics_plot - Create metrics plots
    • /create_experiment_comparison - Compare experiments
  4. Testing and Demo:

    • /simulate_training_data - Simulate training data
    • /create_demo_experiment - Create demo experiments

Configuration

Environment Variables

# Required for Space URL resolution
export HF_TOKEN="your_huggingface_token"

# Optional: Custom Space ID
export TRACKIO_SPACE_ID="your-username/your-space-name"

# Optional: Dataset repository
export TRACKIO_DATASET_REPO="your-username/your-dataset"

Default Configuration

  • Default Space ID: Tonic/trackio-monitoring-20250727
  • Default Dataset: tonic/trackio-experiments
  • Auto-resolution: Enabled by default

Testing

Test Script

File: tests/test_trackio_api_fix.py

Tests Included:

  1. Space URL Resolution - Tests automatic URL resolution
  2. API Client - Tests all API endpoints
  3. Monitoring Integration - Tests full monitoring workflow

Running Tests

python tests/test_trackio_api_fix.py

Expected Output: ``` πŸš€ Starting Trackio API Client Tests with Automatic URL Resolution

βœ… Space URL Resolution: PASSED βœ… API Client Test: PASSED βœ… Monitoring Integration: PASSED

πŸŽ‰ All tests passed! The Trackio integration with automatic URL resolution is working correctly.


## Benefits

### 1. **Reliability**
- βœ… No more 404 errors
- βœ… Proper error handling and fallbacks
- βœ… Automatic retry mechanisms

### 2. **Flexibility**
- βœ… Automatic Space URL resolution
- βœ… Support for any Trackio Space
- βœ… Configurable via environment variables

### 3. **Maintainability**
- βœ… Clean separation of concerns
- βœ… Proper logging and debugging
- βœ… Comprehensive test coverage

### 4. **User Experience**
- βœ… Seamless integration with training pipeline
- βœ… Real-time experiment monitoring
- βœ… Automatic experiment creation and management

## Usage Examples

### Basic Usage
```python
from scripts.trackio_tonic.trackio_api_client import TrackioAPIClient

# Initialize with Space ID (URL resolved automatically)
client = TrackioAPIClient("Tonic/trackio-monitoring-20250727")

# Create experiment
result = client.create_experiment("my_experiment", "Test experiment")

# Log metrics
metrics = {"loss": 1.234, "accuracy": 0.85}
client.log_metrics("exp_123", metrics, step=100)

With Monitoring Integration

from src.monitoring import SmolLM3Monitor

# Create monitor (automatically creates experiment)
monitor = SmolLM3Monitor(
    experiment_name="my_training_run",
    enable_tracking=True
)

# Log metrics during training
monitor.log_metrics({"loss": 1.234}, step=100)

# Log configuration
monitor.log_config({"learning_rate": 2e-5, "batch_size": 8})

Troubleshooting

Common Issues

  1. "gradio_client not available"

    pip install gradio_client
    
  2. "huggingface_hub not available"

    pip install huggingface_hub
    
  3. "Space not accessible"

    • Check if the Space is running
    • Verify Space ID is correct
    • Ensure HF token has proper permissions
  4. "Experiment not found"

    • Experiments are created automatically by the monitor
    • Use the experiment ID returned by create_experiment()

Debug Mode

Enable debug logging to see detailed API calls:

import logging
logging.basicConfig(level=logging.DEBUG)

Future Enhancements

Planned Features

  1. Multi-Space Support - Support for multiple Trackio Spaces
  2. Advanced Metrics - Support for custom metric types
  3. Artifact Upload - Direct file upload to Spaces
  4. Real-time Dashboard - Live monitoring dashboard
  5. Export Capabilities - Export experiments to various formats

Extensibility

The new architecture is designed to be easily extensible:

  • Modular API client design
  • Plugin-based monitoring system
  • Configurable Space resolution
  • Support for custom endpoints

Conclusion

The Trackio API integration has been successfully fixed and enhanced with:

  • βœ… Resolved 404 errors through proper Gradio client usage
  • βœ… Automatic URL resolution using Hugging Face Hub API
  • βœ… Comprehensive testing with full test coverage
  • βœ… Enhanced monitoring with seamless integration
  • βœ… Future-proof architecture for easy extensions

The system is now production-ready and provides reliable experiment tracking for SmolLM3 fine-tuning workflows.