Spaces:
Running
Running
# π Push to Hugging Face Script Guide | |
## Overview | |
The `push_to_huggingface.py` script has been enhanced to integrate with **HF Datasets** for experiment tracking and provides complete model deployment with persistent experiment storage. | |
## π Key Improvements | |
### **1. HF Datasets Integration** | |
- β **Dataset Repository Support**: Configurable dataset repository for experiment storage | |
- β **Environment Variables**: Automatic detection of `HF_TOKEN` and `TRACKIO_DATASET_REPO` | |
- β **Enhanced Logging**: Logs push actions to both Trackio and HF Datasets | |
- β **Model Card Integration**: Includes dataset repository information in model cards | |
### **2. Enhanced Configuration** | |
- β **Flexible Token Input**: Multiple ways to provide HF token | |
- β **Dataset Repository Tracking**: Links models to their experiment datasets | |
- β **Environment Variable Support**: Fallback to environment variables | |
- β **Command Line Arguments**: New arguments for HF Datasets integration | |
### **3. Improved Model Cards** | |
- β **Dataset Repository Info**: Shows which dataset contains experiment data | |
- β **Experiment Tracking Section**: Explains how to access training data | |
- β **Enhanced Documentation**: Better model cards with experiment links | |
## π Usage Examples | |
### **Basic Usage** | |
```bash | |
# Push model with default settings | |
python push_to_huggingface.py /path/to/model username/repo-name | |
``` | |
### **With HF Datasets Integration** | |
```bash | |
# Push model with custom dataset repository | |
python push_to_huggingface.py /path/to/model username/repo-name \ | |
--dataset-repo username/experiments | |
``` | |
### **With Custom Token** | |
```bash | |
# Push model with custom HF token | |
python push_to_huggingface.py /path/to/model username/repo-name \ | |
--hf-token your_token_here | |
``` | |
### **Complete Example** | |
```bash | |
# Push model with all options | |
python push_to_huggingface.py /path/to/model username/repo-name \ | |
--dataset-repo username/experiments \ | |
--hf-token your_token_here \ | |
--private \ | |
--experiment-name "smollm3_finetune_v2" | |
``` | |
## π§ Command Line Arguments | |
| Argument | Required | Default | Description | | |
|----------|----------|---------|-------------| | |
| `model_path` | β Yes | None | Path to trained model directory | | |
| `repo_name` | β Yes | None | HF repository name (username/repo-name) | | |
| `--token` | β No | `HF_TOKEN` env | Hugging Face token | | |
| `--hf-token` | β No | `HF_TOKEN` env | HF token (alternative to --token) | | |
| `--private` | β No | False | Make repository private | | |
| `--trackio-url` | β No | None | Trackio Space URL for logging | | |
| `--experiment-name` | β No | None | Experiment name for Trackio | | |
| `--dataset-repo` | β No | `TRACKIO_DATASET_REPO` env | HF Dataset repository | | |
## π οΈ Configuration Methods | |
### **Method 1: Command Line Arguments** | |
```bash | |
python push_to_huggingface.py model_path repo_name \ | |
--dataset-repo username/experiments \ | |
--hf-token your_token_here | |
``` | |
### **Method 2: Environment Variables** | |
```bash | |
export HF_TOKEN=your_token_here | |
export TRACKIO_DATASET_REPO=username/experiments | |
python push_to_huggingface.py model_path repo_name | |
``` | |
### **Method 3: Hybrid Approach** | |
```bash | |
# Set defaults via environment variables | |
export HF_TOKEN=your_token_here | |
export TRACKIO_DATASET_REPO=username/experiments | |
# Override specific values via command line | |
python push_to_huggingface.py model_path repo_name \ | |
--dataset-repo username/specific-experiments | |
``` | |
## π What Gets Pushed | |
### **Model Files** | |
- β **Model Weights**: `pytorch_model.bin` | |
- β **Configuration**: `config.json` | |
- β **Tokenizer**: `tokenizer.json`, `tokenizer_config.json` | |
- β **All Other Files**: Any additional files in model directory | |
### **Documentation** | |
- β **Model Card**: Comprehensive README.md with model information | |
- β **Training Configuration**: JSON configuration used for training | |
- β **Training Results**: JSON results and metrics | |
- β **Training Logs**: Text logs from training process | |
### **Experiment Data** | |
- β **Dataset Repository**: Links to HF Dataset containing experiment data | |
- β **Training Metrics**: All training metrics stored in dataset | |
- β **Configuration**: Training configuration stored in dataset | |
- β **Artifacts**: Training artifacts and logs | |
## π Enhanced Model Cards | |
The improved script creates enhanced model cards that include: | |
### **Model Information** | |
- Base model and architecture | |
- Training date and model size | |
- **Dataset repository** for experiment data | |
### **Training Configuration** | |
- Complete training parameters | |
- Hardware information | |
- Training duration and steps | |
### **Experiment Tracking** | |
- Links to HF Dataset repository | |
- Instructions for accessing experiment data | |
- Training metrics and results | |
### **Usage Examples** | |
- Code examples for loading and using the model | |
- Generation examples | |
- Performance information | |
## π Logging Integration | |
### **Trackio Logging** | |
- β **Push Actions**: Logs model push events | |
- β **Model Information**: Repository name, size, configuration | |
- β **Training Data**: Links to experiment dataset | |
### **HF Datasets Logging** | |
- β **Experiment Summary**: Final training summary | |
- β **Push Metadata**: Model repository and push date | |
- β **Configuration**: Complete training configuration | |
### **Dual Storage** | |
- β **Trackio**: Real-time monitoring and visualization | |
- β **HF Datasets**: Persistent experiment storage | |
- β **Synchronized**: Both systems updated together | |
## π¨ Troubleshooting | |
### **Issue: "Missing required files"** | |
**Solutions**: | |
1. Check model directory contains required files | |
2. Ensure model was saved correctly during training | |
3. Verify file permissions | |
### **Issue: "Failed to create repository"** | |
**Solutions**: | |
1. Check HF token has write permissions | |
2. Verify repository name format: `username/repo-name` | |
3. Ensure repository doesn't already exist (or use `--private`) | |
### **Issue: "Failed to upload files"** | |
**Solutions**: | |
1. Check network connectivity | |
2. Verify HF token is valid | |
3. Ensure repository was created successfully | |
### **Issue: "Dataset repository not found"** | |
**Solutions**: | |
1. Check dataset repository exists | |
2. Verify HF token has read access | |
3. Use `--dataset-repo` to specify correct repository | |
## π Workflow Integration | |
### **Complete Training Workflow** | |
1. **Train Model**: Use training scripts with monitoring | |
2. **Monitor Progress**: View metrics in Trackio interface | |
3. **Push Model**: Use improved push script | |
4. **Access Data**: View experiments in HF Dataset repository | |
### **Example Workflow** | |
```bash | |
# 1. Train model with monitoring | |
python train.py config/train_smollm3_openhermes_fr.py \ | |
--experiment_name "smollm3_french_v2" | |
# 2. Push model to HF Hub | |
python push_to_huggingface.py outputs/model username/smollm3-french \ | |
--dataset-repo username/experiments \ | |
--experiment-name "smollm3_french_v2" | |
# 3. View results | |
# - Model: https://huggingface.co/username/smollm3-french | |
# - Experiments: https://huggingface.co/datasets/username/experiments | |
# - Trackio: Your Trackio Space interface | |
``` | |
## π― Benefits | |
### **For Model Deployment** | |
- β **Complete Documentation**: Enhanced model cards with experiment links | |
- β **Persistent Storage**: Experiment data stored in HF Datasets | |
- β **Easy Access**: Direct links to training data and metrics | |
- β **Reproducibility**: Complete training configuration included | |
### **For Experiment Management** | |
- β **Centralized Storage**: All experiments in HF Dataset repository | |
- β **Version Control**: Model versions linked to experiment data | |
- β **Collaboration**: Share experiments and models easily | |
- β **Searchability**: Easy to find specific experiments | |
### **For Development** | |
- β **Flexible Configuration**: Multiple ways to set parameters | |
- β **Backward Compatible**: Works with existing setups | |
- β **Error Handling**: Clear error messages and troubleshooting | |
- β **Integration**: Works with existing monitoring system | |
## π Testing Results | |
All push script tests passed: | |
- β **HuggingFacePusher Initialization**: Works with new parameters | |
- β **Model Card Creation**: Includes HF Datasets integration | |
- β **Logging Integration**: Logs to both Trackio and HF Datasets | |
- β **Argument Parsing**: Handles new command line arguments | |
- β **Environment Variables**: Proper fallback handling | |
## π Migration Guide | |
### **From Old Script** | |
```bash | |
# Old way | |
python push_to_huggingface.py model_path repo_name --token your_token | |
# New way (same functionality) | |
python push_to_huggingface.py model_path repo_name --hf-token your_token | |
# New way with HF Datasets | |
python push_to_huggingface.py model_path repo_name \ | |
--hf-token your_token \ | |
--dataset-repo username/experiments | |
``` | |
### **Environment Variables** | |
```bash | |
# Set environment variables for automatic detection | |
export HF_TOKEN=your_token_here | |
export TRACKIO_DATASET_REPO=username/experiments | |
# Then use simple command | |
python push_to_huggingface.py model_path repo_name | |
``` | |
--- | |
**π Your push script is now fully integrated with HF Datasets for complete experiment tracking and model deployment!** |