SmolFactory / docs /GIT_CONFIGURATION_GUIDE.md
Tonic's picture
adds sft , quantization, better readmes
40fd629 verified
|
raw
history blame
7.96 kB

Git Configuration Guide for Hugging Face Operations

This guide explains the correct way to configure git for Hugging Face Spaces deployment and model pushing operations.

🎯 Overview

When working with Hugging Face Spaces and model repositories, proper git configuration is essential for:

  • Creating and deploying Spaces
  • Pushing models to the Hub
  • Managing experiment tracking datasets
  • Ensuring proper authentication
  • Using the user's actual email address for proper git identity and commit attribution

βœ… Correct Git Configuration

1. Local vs Global Configuration

❌ Wrong (Current):

git config --global user.email "$HF_USERNAME@example.com"
git config --global user.name "$HF_USERNAME"

βœ… Correct (Updated):

# Get user's actual email address
read -p "Enter your email address for git configuration: " GIT_EMAIL

# Configure git locally for this project only
git config user.email "$GIT_EMAIL"
git config user.name "$HF_USERNAME"

# Verify configuration
git config user.email
git config user.name

2. Proper Authentication Setup

βœ… Correct Authentication:

# Login with token and add to git credentials
hf login --token "$HF_TOKEN" --add-to-git-credential

# Verify login
hf whoami

3. Error Handling

βœ… Robust Configuration:

# Get user's email and configure git with error handling
read -p "Enter your email address for git configuration: " GIT_EMAIL

if git config user.email "$GIT_EMAIL" && \
   git config user.name "$HF_USERNAME"; then
    echo "βœ… Git configured successfully"
    echo "  Email: $(git config user.email)"
    echo "  Name: $(git config user.name)"
else
    echo "❌ Failed to configure git"
    exit 1
fi

πŸ”§ Why These Changes Matter

1. Local Configuration Benefits

  • Isolation: Doesn't affect other projects on the system
  • Project-specific: Each project can have different git settings
  • Cleaner: No global state pollution
  • Safer: Won't interfere with existing git configurations

2. User's Actual Email Address

  • Professional: Uses the user's real email address
  • Authentic: Represents the actual user's identity
  • Consistent: Matches the user's Hugging Face account
  • Best Practice: Follows git configuration standards

3. Token-based Authentication

  • Secure: Uses HF token instead of username/password
  • Automated: No manual password entry required
  • Persistent: Credentials stored securely
  • Verified: Includes verification steps

πŸ“‹ Implementation in Launch Script

Updated Authentication Step:

# Step 8: Authentication setup
print_step "Step 8: Authentication Setup"
echo "================================"

export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"

# Login to Hugging Face with token
print_info "Logging in to Hugging Face..."
if hf login --token "$HF_TOKEN" --add-to-git-credential; then
    print_status "Successfully logged in to Hugging Face"
    print_info "Username: $(hf whoami)"
else
    print_error "Failed to login to Hugging Face"
    print_error "Please check your token and try again"
    exit 1
fi

# Configure git for HF operations
print_step "Step 8.1: Git Configuration"
echo "================================"

print_info "Configuring git for Hugging Face operations..."

# Get user's email for git configuration
get_input "Enter your email address for git configuration" "" GIT_EMAIL

# Configure git locally (not globally) for this project
git config user.email "$GIT_EMAIL"
git config user.name "$HF_USERNAME"

# Verify git configuration
print_info "Verifying git configuration..."
if git config user.email && git config user.name; then
    print_status "Git configured successfully"
    print_info "  Email: $(git config user.email)"
    print_info "  Name: $(git config user.name)"
else
    print_error "Failed to configure git"
    exit 1
fi

πŸš€ Deployment Script Improvements

Robust File Upload:

def upload_files(self) -> bool:
    """Upload necessary files to the Space"""
    try:
        print("Uploading files to Space...")
        
        # Files to upload
        files_to_upload = [
            "app.py",
            "requirements_space.txt",
            "README.md"
        ]
        
        # Check if we're in a git repository
        try:
            subprocess.run(["git", "status"], capture_output=True, check=True)
        except subprocess.CalledProcessError:
            print("⚠️  Not in a git repository, initializing...")
            subprocess.run(["git", "init"], check=True)
            subprocess.run(["git", "remote", "add", "origin", f"https://huggingface.co/spaces/{self.username}/{self.space_name}"], check=True)
        
        # Add all files at once
        existing_files = [f for f in files_to_upload if os.path.exists(f)]
        if existing_files:
            subprocess.run(["git", "add"] + existing_files, check=True)
            subprocess.run(["git", "commit", "-m", "Initial Space setup"], check=True)
            
            # Push to the space
            try:
                subprocess.run(["git", "push", "origin", "main"], check=True)
                print(f"βœ… Uploaded {len(existing_files)} files")
            except subprocess.CalledProcessError:
                # Try pushing to master branch if main doesn't exist
                subprocess.run(["git", "push", "origin", "master"], check=True)
                print(f"βœ… Uploaded {len(existing_files)} files")
        else:
            print("⚠️  No files found to upload")
        
        return True
        
    except Exception as e:
        print(f"❌ Error uploading files: {e}")
        return False

πŸ” Troubleshooting

Common Issues and Solutions:

1. Git Configuration Fails

# Check current git config
git config --list

# Reset if needed
git config --unset user.email
git config --unset user.name

# Reconfigure
git config user.email "[email protected]"
git config user.name "your-username"

2. Authentication Issues

# Check HF login status
hf whoami

# Re-login if needed
hf logout
hf login --token "your-token"

3. Space Deployment Fails

# Check git remote
git remote -v

# Re-add remote if needed
git remote remove origin
git remote add origin https://huggingface.co/spaces/username/space-name

πŸ“š Best Practices

1. Always Use Local Configuration

  • Use git config without --global flag
  • Keeps project configurations isolated
  • Prevents conflicts with other projects

2. Verify Configuration

  • Always check that git config was successful
  • Display configured values for verification
  • Exit on failure to prevent downstream issues

3. Use Token-based Authentication

  • More secure than username/password
  • Automatically handles credential storage
  • Works well with CI/CD systems

4. Handle Errors Gracefully

  • Check return codes from git commands
  • Provide clear error messages
  • Exit early on critical failures

5. Test Configuration

  • Verify git config after setting it
  • Test HF login before proceeding
  • Validate remote repository access

🎯 Summary

The updated git configuration approach provides:

  1. βœ… Better Isolation: Local configuration doesn't affect system-wide settings
  2. βœ… User's Actual Email: Uses the user's real email address for proper git identity
  3. βœ… Proper Authentication: Token-based login with credential storage
  4. βœ… Error Handling: Robust verification and error reporting
  5. βœ… Professional Setup: Uses user's actual email and verification
  6. βœ… Deployment Reliability: Improved Space deployment with git repository handling

This ensures a more reliable and professional setup for Hugging Face operations in the SmolLM3 fine-tuning pipeline.