SmolFactory / docs /ENVIRONMENT_SETUP_FIX.md
Tonic's picture
adds hf cli fixes
fd0524b verified
|
raw
history blame
7.92 kB

Environment Setup Fix

Issue Identified

The user requested to ensure that the provided token is properly available in the new virtual environment created during the launch script execution to avoid errors.

Root Cause

The launch.sh script was setting environment variables after creating the virtual environment, which could cause the token to not be available within the virtual environment context.

Fixes Applied

1. Environment Variables Set Before Virtual Environment βœ… FIXED

File: launch.sh

Changes:

  • Set environment variables before creating the virtual environment
  • Re-export environment variables after activating the virtual environment
  • Added verification step to ensure token is available

Before:

print_info "Creating Python virtual environment..."
python3 -m venv smollm3_env
source smollm3_env/bin/activate

# ... install dependencies ...

# Step 8: Authentication setup
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"

After:

# Set environment variables before creating virtual environment
print_info "Setting up environment variables..."
export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"

print_info "Creating Python virtual environment..."
python3 -m venv smollm3_env
source smollm3_env/bin/activate

# Re-export environment variables in the virtual environment
print_info "Configuring environment variables in virtual environment..."
export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"

2. Token Verification Step βœ… ADDED

File: launch.sh

Added verification to ensure token is properly configured:

# Verify token is available in the virtual environment
print_info "Verifying token availability in virtual environment..."
if [ -n "$HF_TOKEN" ] && [ -n "$HUGGING_FACE_HUB_TOKEN" ]; then
    print_status "βœ… Token properly configured in virtual environment"
    print_info "  HF_TOKEN: ${HF_TOKEN:0:10}...${HF_TOKEN: -4}"
    print_info "  HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN:0:10}...${HUGGING_FACE_HUB_TOKEN: -4}"
else
    print_error "❌ Token not properly configured in virtual environment"
    print_error "Please check your token and try again"
    exit 1
fi

3. Environment Variables Before Each Script Call βœ… ADDED

File: launch.sh

Added environment variable exports before each Python script call:

Trackio Space Deployment:

# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"

python deploy_trackio_space.py "$TRACKIO_SPACE_NAME" "$HF_TOKEN" "$GIT_EMAIL"

Dataset Setup:

# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"

python setup_hf_dataset.py "$HF_TOKEN"

Trackio Configuration:

# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"

python configure_trackio.py

Training Script:

# Ensure environment variables are available for training
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"

python scripts/training/train.py \
    --config "$CONFIG_FILE" \
    --experiment-name "$EXPERIMENT_NAME" \
    --output-dir /output-checkpoint \
    --trackio-url "$TRACKIO_URL" \
    --trainer-type "$TRAINER_TYPE"

Model Push:

# Ensure environment variables are available for model push
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"

python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
    --token "$HF_TOKEN" \
    --trackio-url "$TRACKIO_URL" \
    --experiment-name "$EXPERIMENT_NAME" \
    --dataset-repo "$TRACKIO_DATASET_REPO"

Quantization Scripts:

# Ensure environment variables are available for quantization
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"

python scripts/model_tonic/quantize_model.py /output-checkpoint "$REPO_NAME" \
    --quant-type "$QUANT_TYPE" \
    --device "$DEVICE" \
    --token "$HF_TOKEN" \
    --trackio-url "$TRACKIO_URL" \
    --experiment-name "${EXPERIMENT_NAME}-${QUANT_TYPE}" \
    --dataset-repo "$TRACKIO_DATASET_REPO"

Key Improvements

1. Proper Environment Variable Timing

  • βœ… Set before virtual environment: Variables set before creating venv
  • βœ… Re-export after activation: Variables re-exported after activating venv
  • βœ… Before each script: Variables exported before each Python script call
  • βœ… Verification step: Token availability verified before proceeding

2. Comprehensive Coverage

  • βœ… All scripts covered: Every Python script has environment variables
  • βœ… Multiple variables: HF_TOKEN, HUGGING_FACE_HUB_TOKEN, HF_USERNAME, TRACKIO_DATASET_REPO
  • βœ… Consistent naming: All scripts use the same environment variable names
  • βœ… Error handling: Verification step catches missing tokens

3. Cross-Platform Compatibility

  • βœ… Bash compatible: Uses standard bash export syntax
  • βœ… Virtual environment aware: Properly handles venv activation
  • βœ… Token validation: Verifies token availability before use
  • βœ… Clear error messages: Descriptive error messages for debugging

Environment Variables Set

The following environment variables are now properly set and available in the virtual environment:

  1. HF_TOKEN - The Hugging Face token for authentication
  2. HUGGING_FACE_HUB_TOKEN - Alternative token variable for Python API
  3. HF_USERNAME - Username extracted from token
  4. TRACKIO_DATASET_REPO - Dataset repository for Trackio

Test Results

Environment Setup Test

$ python tests/test_environment_setup.py

πŸš€ Environment Setup Verification
==================================================
πŸ” Testing Environment Variables
[OK] HF_TOKEN: hf_FWrfleE...zuoF
[OK] HUGGING_FACE_HUB_TOKEN: hf_FWrfleE...zuoF
[OK] HF_USERNAME: Tonic...onic
[OK] TRACKIO_DATASET_REPO: Tonic/trac...ents

πŸ” Testing Launch Script Environment Setup
[OK] Found: export HF_TOKEN=
[OK] Found: export HUGGING_FACE_HUB_TOKEN=
[OK] Found: export HF_USERNAME=
[OK] Found: export TRACKIO_DATASET_REPO=
[OK] Found virtual environment activation
[OK] Found environment variable re-export after activation

[SUCCESS] ALL ENVIRONMENT TESTS PASSED!
[OK] Environment variables: Properly set
[OK] Virtual environment: Can access variables
[OK] Launch script: Properly configured

The environment setup is working correctly!

User Token Status

Token: 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 Status: βœ… Working correctly in virtual environment Username: Tonic (auto-detected)

Next Steps

The user can now run the launch script with confidence that the token will be properly available in the virtual environment:

./launch.sh

The script will:

  1. βœ… Set environment variables before creating virtual environment
  2. βœ… Re-export variables after activating virtual environment
  3. βœ… Verify token availability before proceeding
  4. βœ… Export variables before each Python script call
  5. βœ… Ensure all scripts have access to the token

No more token-related errors in the virtual environment! πŸŽ‰