Spaces:
Running
Running
π Push to Hugging Face Script Guide
Overview
The push_to_huggingface.py
script has been enhanced to integrate with HF Datasets for experiment tracking and provides complete model deployment with persistent experiment storage.
π Key Improvements
1. HF Datasets Integration
- β Dataset Repository Support: Configurable dataset repository for experiment storage
- β
Environment Variables: Automatic detection of
HF_TOKEN
andTRACKIO_DATASET_REPO
- β Enhanced Logging: Logs push actions to both Trackio and HF Datasets
- β Model Card Integration: Includes dataset repository information in model cards
2. Enhanced Configuration
- β Flexible Token Input: Multiple ways to provide HF token
- β Dataset Repository Tracking: Links models to their experiment datasets
- β Environment Variable Support: Fallback to environment variables
- β Command Line Arguments: New arguments for HF Datasets integration
3. Improved Model Cards
- β Dataset Repository Info: Shows which dataset contains experiment data
- β Experiment Tracking Section: Explains how to access training data
- β Enhanced Documentation: Better model cards with experiment links
π Usage Examples
Basic Usage
# Push model with default settings
python push_to_huggingface.py /path/to/model username/repo-name
With HF Datasets Integration
# Push model with custom dataset repository
python push_to_huggingface.py /path/to/model username/repo-name \
--dataset-repo username/experiments
With Custom Token
# Push model with custom HF token
python push_to_huggingface.py /path/to/model username/repo-name \
--hf-token your_token_here
Complete Example
# Push model with all options
python push_to_huggingface.py /path/to/model username/repo-name \
--dataset-repo username/experiments \
--hf-token your_token_here \
--private \
--experiment-name "smollm3_finetune_v2"
π§ Command Line Arguments
Argument | Required | Default | Description |
---|---|---|---|
model_path |
β Yes | None | Path to trained model directory |
repo_name |
β Yes | None | HF repository name (username/repo-name) |
--token |
β No | HF_TOKEN env |
Hugging Face token |
--hf-token |
β No | HF_TOKEN env |
HF token (alternative to --token) |
--private |
β No | False | Make repository private |
--trackio-url |
β No | None | Trackio Space URL for logging |
--experiment-name |
β No | None | Experiment name for Trackio |
--dataset-repo |
β No | TRACKIO_DATASET_REPO env |
HF Dataset repository |
π οΈ Configuration Methods
Method 1: Command Line Arguments
python push_to_huggingface.py model_path repo_name \
--dataset-repo username/experiments \
--hf-token your_token_here
Method 2: Environment Variables
export HF_TOKEN=your_token_here
export TRACKIO_DATASET_REPO=username/experiments
python push_to_huggingface.py model_path repo_name
Method 3: Hybrid Approach
# Set defaults via environment variables
export HF_TOKEN=your_token_here
export TRACKIO_DATASET_REPO=username/experiments
# Override specific values via command line
python push_to_huggingface.py model_path repo_name \
--dataset-repo username/specific-experiments
π What Gets Pushed
Model Files
- β
Model Weights:
pytorch_model.bin
- β
Configuration:
config.json
- β
Tokenizer:
tokenizer.json
,tokenizer_config.json
- β All Other Files: Any additional files in model directory
Documentation
- β Model Card: Comprehensive README.md with model information
- β Training Configuration: JSON configuration used for training
- β Training Results: JSON results and metrics
- β Training Logs: Text logs from training process
Experiment Data
- β Dataset Repository: Links to HF Dataset containing experiment data
- β Training Metrics: All training metrics stored in dataset
- β Configuration: Training configuration stored in dataset
- β Artifacts: Training artifacts and logs
π Enhanced Model Cards
The improved script creates enhanced model cards that include:
Model Information
- Base model and architecture
- Training date and model size
- Dataset repository for experiment data
Training Configuration
- Complete training parameters
- Hardware information
- Training duration and steps
Experiment Tracking
- Links to HF Dataset repository
- Instructions for accessing experiment data
- Training metrics and results
Usage Examples
- Code examples for loading and using the model
- Generation examples
- Performance information
π Logging Integration
Trackio Logging
- β Push Actions: Logs model push events
- β Model Information: Repository name, size, configuration
- β Training Data: Links to experiment dataset
HF Datasets Logging
- β Experiment Summary: Final training summary
- β Push Metadata: Model repository and push date
- β Configuration: Complete training configuration
Dual Storage
- β Trackio: Real-time monitoring and visualization
- β HF Datasets: Persistent experiment storage
- β Synchronized: Both systems updated together
π¨ Troubleshooting
Issue: "Missing required files"
Solutions:
- Check model directory contains required files
- Ensure model was saved correctly during training
- Verify file permissions
Issue: "Failed to create repository"
Solutions:
- Check HF token has write permissions
- Verify repository name format:
username/repo-name
- Ensure repository doesn't already exist (or use
--private
)
Issue: "Failed to upload files"
Solutions:
- Check network connectivity
- Verify HF token is valid
- Ensure repository was created successfully
Issue: "Dataset repository not found"
Solutions:
- Check dataset repository exists
- Verify HF token has read access
- Use
--dataset-repo
to specify correct repository
π Workflow Integration
Complete Training Workflow
- Train Model: Use training scripts with monitoring
- Monitor Progress: View metrics in Trackio interface
- Push Model: Use improved push script
- Access Data: View experiments in HF Dataset repository
Example Workflow
# 1. Train model with monitoring
python train.py config/train_smollm3_openhermes_fr.py \
--experiment_name "smollm3_french_v2"
# 2. Push model to HF Hub
python push_to_huggingface.py outputs/model username/smollm3-french \
--dataset-repo username/experiments \
--experiment-name "smollm3_french_v2"
# 3. View results
# - Model: https://huggingface.co/username/smollm3-french
# - Experiments: https://huggingface.co/datasets/username/experiments
# - Trackio: Your Trackio Space interface
π― Benefits
For Model Deployment
- β Complete Documentation: Enhanced model cards with experiment links
- β Persistent Storage: Experiment data stored in HF Datasets
- β Easy Access: Direct links to training data and metrics
- β Reproducibility: Complete training configuration included
For Experiment Management
- β Centralized Storage: All experiments in HF Dataset repository
- β Version Control: Model versions linked to experiment data
- β Collaboration: Share experiments and models easily
- β Searchability: Easy to find specific experiments
For Development
- β Flexible Configuration: Multiple ways to set parameters
- β Backward Compatible: Works with existing setups
- β Error Handling: Clear error messages and troubleshooting
- β Integration: Works with existing monitoring system
π Testing Results
All push script tests passed:
- β HuggingFacePusher Initialization: Works with new parameters
- β Model Card Creation: Includes HF Datasets integration
- β Logging Integration: Logs to both Trackio and HF Datasets
- β Argument Parsing: Handles new command line arguments
- β Environment Variables: Proper fallback handling
π Migration Guide
From Old Script
# Old way
python push_to_huggingface.py model_path repo_name --token your_token
# New way (same functionality)
python push_to_huggingface.py model_path repo_name --hf-token your_token
# New way with HF Datasets
python push_to_huggingface.py model_path repo_name \
--hf-token your_token \
--dataset-repo username/experiments
Environment Variables
# Set environment variables for automatic detection
export HF_TOKEN=your_token_here
export TRACKIO_DATASET_REPO=username/experiments
# Then use simple command
python push_to_huggingface.py model_path repo_name
π Your push script is now fully integrated with HF Datasets for complete experiment tracking and model deployment!