SmolFactory / docs /GIT_CONFIGURATION_GUIDE.md
Tonic's picture
adds sft , quantization, better readmes
40fd629 verified
|
raw
history blame
7.96 kB
# Git Configuration Guide for Hugging Face Operations
This guide explains the correct way to configure git for Hugging Face Spaces deployment and model pushing operations.
## 🎯 **Overview**
When working with Hugging Face Spaces and model repositories, proper git configuration is essential for:
- Creating and deploying Spaces
- Pushing models to the Hub
- Managing experiment tracking datasets
- Ensuring proper authentication
- **Using the user's actual email address for proper git identity and commit attribution**
## βœ… **Correct Git Configuration**
### **1. Local vs Global Configuration**
**❌ Wrong (Current):**
```bash
git config --global user.email "[email protected]"
git config --global user.name "$HF_USERNAME"
```
**βœ… Correct (Updated):**
```bash
# Get user's actual email address
read -p "Enter your email address for git configuration: " GIT_EMAIL
# Configure git locally for this project only
git config user.email "$GIT_EMAIL"
git config user.name "$HF_USERNAME"
# Verify configuration
git config user.email
git config user.name
```
### **2. Proper Authentication Setup**
**βœ… Correct Authentication:**
```bash
# Login with token and add to git credentials
hf login --token "$HF_TOKEN" --add-to-git-credential
# Verify login
hf whoami
```
### **3. Error Handling**
**βœ… Robust Configuration:**
```bash
# Get user's email and configure git with error handling
read -p "Enter your email address for git configuration: " GIT_EMAIL
if git config user.email "$GIT_EMAIL" && \
git config user.name "$HF_USERNAME"; then
echo "βœ… Git configured successfully"
echo " Email: $(git config user.email)"
echo " Name: $(git config user.name)"
else
echo "❌ Failed to configure git"
exit 1
fi
```
## πŸ”§ **Why These Changes Matter**
### **1. Local Configuration Benefits**
- **Isolation**: Doesn't affect other projects on the system
- **Project-specific**: Each project can have different git settings
- **Cleaner**: No global state pollution
- **Safer**: Won't interfere with existing git configurations
### **2. User's Actual Email Address**
- **Professional**: Uses the user's real email address
- **Authentic**: Represents the actual user's identity
- **Consistent**: Matches the user's Hugging Face account
- **Best Practice**: Follows git configuration standards
### **3. Token-based Authentication**
- **Secure**: Uses HF token instead of username/password
- **Automated**: No manual password entry required
- **Persistent**: Credentials stored securely
- **Verified**: Includes verification steps
## πŸ“‹ **Implementation in Launch Script**
### **Updated Authentication Step:**
```bash
# Step 8: Authentication setup
print_step "Step 8: Authentication Setup"
echo "================================"
export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
# Login to Hugging Face with token
print_info "Logging in to Hugging Face..."
if hf login --token "$HF_TOKEN" --add-to-git-credential; then
print_status "Successfully logged in to Hugging Face"
print_info "Username: $(hf whoami)"
else
print_error "Failed to login to Hugging Face"
print_error "Please check your token and try again"
exit 1
fi
# Configure git for HF operations
print_step "Step 8.1: Git Configuration"
echo "================================"
print_info "Configuring git for Hugging Face operations..."
# Get user's email for git configuration
get_input "Enter your email address for git configuration" "" GIT_EMAIL
# Configure git locally (not globally) for this project
git config user.email "$GIT_EMAIL"
git config user.name "$HF_USERNAME"
# Verify git configuration
print_info "Verifying git configuration..."
if git config user.email && git config user.name; then
print_status "Git configured successfully"
print_info " Email: $(git config user.email)"
print_info " Name: $(git config user.name)"
else
print_error "Failed to configure git"
exit 1
fi
```
## πŸš€ **Deployment Script Improvements**
### **Robust File Upload:**
```python
def upload_files(self) -> bool:
"""Upload necessary files to the Space"""
try:
print("Uploading files to Space...")
# Files to upload
files_to_upload = [
"app.py",
"requirements_space.txt",
"README.md"
]
# Check if we're in a git repository
try:
subprocess.run(["git", "status"], capture_output=True, check=True)
except subprocess.CalledProcessError:
print("⚠️ Not in a git repository, initializing...")
subprocess.run(["git", "init"], check=True)
subprocess.run(["git", "remote", "add", "origin", f"https://huggingface.co/spaces/{self.username}/{self.space_name}"], check=True)
# Add all files at once
existing_files = [f for f in files_to_upload if os.path.exists(f)]
if existing_files:
subprocess.run(["git", "add"] + existing_files, check=True)
subprocess.run(["git", "commit", "-m", "Initial Space setup"], check=True)
# Push to the space
try:
subprocess.run(["git", "push", "origin", "main"], check=True)
print(f"βœ… Uploaded {len(existing_files)} files")
except subprocess.CalledProcessError:
# Try pushing to master branch if main doesn't exist
subprocess.run(["git", "push", "origin", "master"], check=True)
print(f"βœ… Uploaded {len(existing_files)} files")
else:
print("⚠️ No files found to upload")
return True
except Exception as e:
print(f"❌ Error uploading files: {e}")
return False
```
## πŸ” **Troubleshooting**
### **Common Issues and Solutions:**
#### **1. Git Configuration Fails**
```bash
# Check current git config
git config --list
# Reset if needed
git config --unset user.email
git config --unset user.name
# Reconfigure
git config user.email "[email protected]"
git config user.name "your-username"
```
#### **2. Authentication Issues**
```bash
# Check HF login status
hf whoami
# Re-login if needed
hf logout
hf login --token "your-token"
```
#### **3. Space Deployment Fails**
```bash
# Check git remote
git remote -v
# Re-add remote if needed
git remote remove origin
git remote add origin https://huggingface.co/spaces/username/space-name
```
## πŸ“š **Best Practices**
### **1. Always Use Local Configuration**
- Use `git config` without `--global` flag
- Keeps project configurations isolated
- Prevents conflicts with other projects
### **2. Verify Configuration**
- Always check that git config was successful
- Display configured values for verification
- Exit on failure to prevent downstream issues
### **3. Use Token-based Authentication**
- More secure than username/password
- Automatically handles credential storage
- Works well with CI/CD systems
### **4. Handle Errors Gracefully**
- Check return codes from git commands
- Provide clear error messages
- Exit early on critical failures
### **5. Test Configuration**
- Verify git config after setting it
- Test HF login before proceeding
- Validate remote repository access
## 🎯 **Summary**
The updated git configuration approach provides:
1. **βœ… Better Isolation**: Local configuration doesn't affect system-wide settings
2. **βœ… User's Actual Email**: Uses the user's real email address for proper git identity
3. **βœ… Proper Authentication**: Token-based login with credential storage
4. **βœ… Error Handling**: Robust verification and error reporting
5. **βœ… Professional Setup**: Uses user's actual email and verification
6. **βœ… Deployment Reliability**: Improved Space deployment with git repository handling
This ensures a more reliable and professional setup for Hugging Face operations in the SmolLM3 fine-tuning pipeline.