SmolFactory / docs /ENVIRONMENT_SETUP_FIX.md
Tonic's picture
adds hf cli fixes
fd0524b verified
|
raw
history blame
7.92 kB
# Environment Setup Fix
## Issue Identified
The user requested to ensure that the provided token is properly available in the new virtual environment created during the launch script execution to avoid errors.
## Root Cause
The `launch.sh` script was setting environment variables after creating the virtual environment, which could cause the token to not be available within the virtual environment context.
## Fixes Applied
### 1. **Environment Variables Set Before Virtual Environment** βœ… **FIXED**
**File**: `launch.sh`
**Changes**:
- Set environment variables before creating the virtual environment
- Re-export environment variables after activating the virtual environment
- Added verification step to ensure token is available
**Before**:
```bash
print_info "Creating Python virtual environment..."
python3 -m venv smollm3_env
source smollm3_env/bin/activate
# ... install dependencies ...
# Step 8: Authentication setup
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
```
**After**:
```bash
# Set environment variables before creating virtual environment
print_info "Setting up environment variables..."
export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
print_info "Creating Python virtual environment..."
python3 -m venv smollm3_env
source smollm3_env/bin/activate
# Re-export environment variables in the virtual environment
print_info "Configuring environment variables in virtual environment..."
export HF_TOKEN="$HF_TOKEN"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
```
### 2. **Token Verification Step** βœ… **ADDED**
**File**: `launch.sh`
**Added verification to ensure token is properly configured**:
```bash
# Verify token is available in the virtual environment
print_info "Verifying token availability in virtual environment..."
if [ -n "$HF_TOKEN" ] && [ -n "$HUGGING_FACE_HUB_TOKEN" ]; then
print_status "βœ… Token properly configured in virtual environment"
print_info " HF_TOKEN: ${HF_TOKEN:0:10}...${HF_TOKEN: -4}"
print_info " HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN:0:10}...${HUGGING_FACE_HUB_TOKEN: -4}"
else
print_error "❌ Token not properly configured in virtual environment"
print_error "Please check your token and try again"
exit 1
fi
```
### 3. **Environment Variables Before Each Script Call** βœ… **ADDED**
**File**: `launch.sh`
**Added environment variable exports before each Python script call**:
**Trackio Space Deployment**:
```bash
# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
python deploy_trackio_space.py "$TRACKIO_SPACE_NAME" "$HF_TOKEN" "$GIT_EMAIL"
```
**Dataset Setup**:
```bash
# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
python setup_hf_dataset.py "$HF_TOKEN"
```
**Trackio Configuration**:
```bash
# Ensure environment variables are available for the script
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
python configure_trackio.py
```
**Training Script**:
```bash
# Ensure environment variables are available for training
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
python scripts/training/train.py \
--config "$CONFIG_FILE" \
--experiment-name "$EXPERIMENT_NAME" \
--output-dir /output-checkpoint \
--trackio-url "$TRACKIO_URL" \
--trainer-type "$TRAINER_TYPE"
```
**Model Push**:
```bash
# Ensure environment variables are available for model push
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
python scripts/model_tonic/push_to_huggingface.py /output-checkpoint "$REPO_NAME" \
--token "$HF_TOKEN" \
--trackio-url "$TRACKIO_URL" \
--experiment-name "$EXPERIMENT_NAME" \
--dataset-repo "$TRACKIO_DATASET_REPO"
```
**Quantization Scripts**:
```bash
# Ensure environment variables are available for quantization
export HF_TOKEN="$HF_TOKEN"
export HUGGING_FACE_HUB_TOKEN="$HF_TOKEN"
export HF_USERNAME="$HF_USERNAME"
export TRACKIO_DATASET_REPO="$TRACKIO_DATASET_REPO"
python scripts/model_tonic/quantize_model.py /output-checkpoint "$REPO_NAME" \
--quant-type "$QUANT_TYPE" \
--device "$DEVICE" \
--token "$HF_TOKEN" \
--trackio-url "$TRACKIO_URL" \
--experiment-name "${EXPERIMENT_NAME}-${QUANT_TYPE}" \
--dataset-repo "$TRACKIO_DATASET_REPO"
```
## Key Improvements
### 1. **Proper Environment Variable Timing**
- βœ… **Set before virtual environment**: Variables set before creating venv
- βœ… **Re-export after activation**: Variables re-exported after activating venv
- βœ… **Before each script**: Variables exported before each Python script call
- βœ… **Verification step**: Token availability verified before proceeding
### 2. **Comprehensive Coverage**
- βœ… **All scripts covered**: Every Python script has environment variables
- βœ… **Multiple variables**: HF_TOKEN, HUGGING_FACE_HUB_TOKEN, HF_USERNAME, TRACKIO_DATASET_REPO
- βœ… **Consistent naming**: All scripts use the same environment variable names
- βœ… **Error handling**: Verification step catches missing tokens
### 3. **Cross-Platform Compatibility**
- βœ… **Bash compatible**: Uses standard bash export syntax
- βœ… **Virtual environment aware**: Properly handles venv activation
- βœ… **Token validation**: Verifies token availability before use
- βœ… **Clear error messages**: Descriptive error messages for debugging
## Environment Variables Set
The following environment variables are now properly set and available in the virtual environment:
1. **`HF_TOKEN`** - The Hugging Face token for authentication
2. **`HUGGING_FACE_HUB_TOKEN`** - Alternative token variable for Python API
3. **`HF_USERNAME`** - Username extracted from token
4. **`TRACKIO_DATASET_REPO`** - Dataset repository for Trackio
## Test Results
### **Environment Setup Test**
```bash
$ python tests/test_environment_setup.py
πŸš€ Environment Setup Verification
==================================================
πŸ” Testing Environment Variables
[OK] HF_TOKEN: hf_FWrfleE...zuoF
[OK] HUGGING_FACE_HUB_TOKEN: hf_FWrfleE...zuoF
[OK] HF_USERNAME: Tonic...onic
[OK] TRACKIO_DATASET_REPO: Tonic/trac...ents
πŸ” Testing Launch Script Environment Setup
[OK] Found: export HF_TOKEN=
[OK] Found: export HUGGING_FACE_HUB_TOKEN=
[OK] Found: export HF_USERNAME=
[OK] Found: export TRACKIO_DATASET_REPO=
[OK] Found virtual environment activation
[OK] Found environment variable re-export after activation
[SUCCESS] ALL ENVIRONMENT TESTS PASSED!
[OK] Environment variables: Properly set
[OK] Virtual environment: Can access variables
[OK] Launch script: Properly configured
The environment setup is working correctly!
```
## User Token Status
**Token**: `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`
**Status**: βœ… **Working correctly in virtual environment**
**Username**: `Tonic` (auto-detected)
## Next Steps
The user can now run the launch script with confidence that the token will be properly available in the virtual environment:
```bash
./launch.sh
```
The script will:
1. βœ… **Set environment variables** before creating virtual environment
2. βœ… **Re-export variables** after activating virtual environment
3. βœ… **Verify token availability** before proceeding
4. βœ… **Export variables** before each Python script call
5. βœ… **Ensure all scripts** have access to the token
**No more token-related errors in the virtual environment!** πŸŽ‰