Spaces:
Running
Running
| # Cloud Deployment Guide for SmolLM3 DPO Training | |
| This guide provides the exact sequence of commands to deploy and run SmolLM3 DPO training on a cloud computing instance with 6 epochs. | |
| ## Prerequisites | |
| ### Cloud Instance Requirements | |
| - **GPU**: NVIDIA A100, H100, or similar (16GB+ VRAM) | |
| - **RAM**: 64GB+ system memory | |
| - **Storage**: 100GB+ SSD storage | |
| - **OS**: Ubuntu 20.04 or 22.04 | |
| ### Required Information | |
| Before starting, gather these details: | |
| - Your Hugging Face username | |
| - Your Hugging Face token (with write permissions) | |
| - Your Trackio Space URL (if using monitoring) | |
| ## Step-by-Step Deployment | |
| ### Step 1: Launch Cloud Instance | |
| Choose your cloud provider and launch an instance: | |
| #### AWS (g5.2xlarge or g5.4xlarge) | |
| ```bash | |
| # Launch instance with Ubuntu 22.04 and appropriate GPU | |
| aws ec2 run-instances \ | |
| --image-id ami-0c7217cdde317cfec \ | |
| --instance-type g5.2xlarge \ | |
| --key-name your-key-pair \ | |
| --security-group-ids sg-xxxxxxxxx | |
| ``` | |
| #### Google Cloud (n1-standard-8 with T4/V100) | |
| ```bash | |
| gcloud compute instances create smollm3-dpo \ | |
| --zone=us-central1-a \ | |
| --machine-type=n1-standard-8 \ | |
| --accelerator="type=nvidia-tesla-t4,count=1" \ | |
| --image-family=ubuntu-2204-lts \ | |
| --image-project=ubuntu-os-cloud | |
| ``` | |
| #### Azure (Standard_NC6s_v3) | |
| ```bash | |
| az vm create \ | |
| --resource-group your-rg \ | |
| --name smollm3-dpo \ | |
| --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts:latest \ | |
| --size Standard_NC6s_v3 \ | |
| --admin-username azureuser | |
| ``` | |
| ### Step 2: Connect to Instance | |
| ```bash | |
| # SSH to your instance | |
| ssh -i your-key.pem ubuntu@your-instance-ip | |
| # Or for Azure | |
| ssh azureuser@your-instance-ip | |
| ``` | |
| ### Step 3: Update System and Install Dependencies | |
| ```bash | |
| # Update system | |
| sudo apt-get update | |
| sudo apt-get upgrade -y | |
| # Install system dependencies | |
| sudo apt-get install -y git curl wget unzip python3 python3-pip python3-venv | |
| # Install NVIDIA drivers (if not pre-installed) | |
| curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg | |
| curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ | |
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ | |
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list | |
| sudo apt-get update | |
| sudo apt-get install -y nvidia-container-toolkit | |
| ``` | |
| ### Step 4: Clone Repository and Setup Environment | |
| ```bash | |
| # Clone your repository | |
| git clone https://github.com/your-username/flexai-finetune.git | |
| cd flexai-finetune | |
| # Create virtual environment | |
| python3 -m venv smollm3_env | |
| source smollm3_env/bin/activate | |
| # Install PyTorch with CUDA | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| # Install project dependencies | |
| pip install -r requirements.txt | |
| # Install additional DPO dependencies | |
| pip install trl>=0.7.0 | |
| pip install peft>=0.4.0 | |
| pip install accelerate>=0.20.0 | |
| ``` | |
| ### Step 5: Configure Authentication | |
| ```bash | |
| # Set your Hugging Face token | |
| export HF_TOKEN="your_huggingface_token_here" | |
| # Login to Hugging Face | |
| hf login --token $HF_TOKEN | |
| ``` | |
| ### Step 6: Create Configuration Files | |
| Create the DPO configuration file: | |
| ```bash | |
| cat > config/train_smollm3_dpo_6epochs.py << 'EOF' | |
| """ | |
| SmolLM3 DPO Training Configuration - 6 Epochs | |
| Optimized for cloud deployment | |
| """ | |
| from config.train_smollm3_dpo import SmolLM3DPOConfig | |
| config = SmolLM3DPOConfig( | |
| # Model configuration | |
| model_name="HuggingFaceTB/SmolLM3-3B", | |
| max_seq_length=4096, | |
| use_flash_attention=True, | |
| use_gradient_checkpointing=True, | |
| # Training configuration | |
| batch_size=2, | |
| gradient_accumulation_steps=8, | |
| learning_rate=5e-6, | |
| weight_decay=0.01, | |
| warmup_steps=100, | |
| max_iters=None, # Will be calculated based on epochs | |
| eval_interval=100, | |
| log_interval=10, | |
| save_interval=500, | |
| # DPO configuration | |
| beta=0.1, | |
| max_prompt_length=2048, | |
| # Optimizer configuration | |
| optimizer="adamw", | |
| beta1=0.9, | |
| beta2=0.95, | |
| eps=1e-8, | |
| # Scheduler configuration | |
| scheduler="cosine", | |
| min_lr=1e-6, | |
| # Mixed precision | |
| fp16=True, | |
| bf16=False, | |
| # Logging and saving | |
| save_steps=500, | |
| eval_steps=100, | |
| logging_steps=10, | |
| save_total_limit=3, | |
| # Evaluation | |
| eval_strategy="steps", | |
| metric_for_best_model="eval_loss", | |
| greater_is_better=False, | |
| load_best_model_at_end=True, | |
| # Data configuration | |
| data_dir="smoltalk_dataset", | |
| train_file="train.json", | |
| validation_file="validation.json", | |
| # Chat template configuration | |
| use_chat_template=True, | |
| chat_template_kwargs={ | |
| "enable_thinking": False, | |
| "add_generation_prompt": True | |
| }, | |
| # Trackio monitoring configuration | |
| enable_tracking=True, | |
| trackio_url="https://your-trackio-space.hf.space", # Change this | |
| trackio_token=None, | |
| log_artifacts=True, | |
| log_metrics=True, | |
| log_config=True, | |
| experiment_name="smollm3_dpo_6epochs" | |
| ) | |
| EOF | |
| ``` | |
| ### Step 7: Download and Prepare Dataset | |
| ```bash | |
| # Create dataset preparation script | |
| cat > prepare_dataset.py << 'EOF' | |
| from datasets import load_dataset | |
| import json | |
| import os | |
| # Load SmolTalk dataset | |
| print('Loading SmolTalk dataset...') | |
| dataset = load_dataset('HuggingFaceTB/smoltalk') | |
| # Create dataset directory | |
| os.makedirs('smoltalk_dataset', exist_ok=True) | |
| # Convert to DPO format (preference pairs) | |
| def convert_to_dpo_format(example): | |
| # For SmolTalk, we'll create preference pairs based on response quality | |
| # This is a simplified example - you may need to adjust based on your needs | |
| return { | |
| 'prompt': example.get('prompt', ''), | |
| 'chosen': example.get('chosen', ''), | |
| 'rejected': example.get('rejected', '') | |
| } | |
| # Process train split | |
| train_data = [] | |
| for example in dataset['train']: | |
| dpo_example = convert_to_dpo_format(example) | |
| if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']: | |
| train_data.append(dpo_example) | |
| # Process validation split | |
| val_data = [] | |
| for example in dataset['validation']: | |
| dpo_example = convert_to_dpo_format(example) | |
| if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']: | |
| val_data.append(dpo_example) | |
| # Save to files | |
| with open('smoltalk_dataset/train.json', 'w') as f: | |
| json.dump(train_data, f, indent=2) | |
| with open('smoltalk_dataset/validation.json', 'w') as f: | |
| json.dump(val_data, f, indent=2) | |
| print(f'Dataset prepared: {len(train_data)} train samples, {len(val_data)} validation samples') | |
| EOF | |
| # Run dataset preparation | |
| python prepare_dataset.py | |
| ``` | |
| ### Step 8: Calculate Training Parameters | |
| ```bash | |
| # Calculate training steps based on epochs | |
| TOTAL_SAMPLES=$(python -c "import json; data=json.load(open('smoltalk_dataset/train.json')); print(len(data))") | |
| BATCH_SIZE=2 | |
| GRADIENT_ACCUMULATION_STEPS=8 | |
| MAX_EPOCHS=6 | |
| EFFECTIVE_BATCH_SIZE=$((BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS)) | |
| STEPS_PER_EPOCH=$((TOTAL_SAMPLES / EFFECTIVE_BATCH_SIZE)) | |
| MAX_STEPS=$((STEPS_PER_EPOCH * MAX_EPOCHS)) | |
| echo "Training Configuration:" | |
| echo " Total samples: $TOTAL_SAMPLES" | |
| echo " Effective batch size: $EFFECTIVE_BATCH_SIZE" | |
| echo " Steps per epoch: $STEPS_PER_EPOCH" | |
| echo " Total training steps: $MAX_STEPS" | |
| echo " Training epochs: $MAX_EPOCHS" | |
| ``` | |
| ### Step 9: Start DPO Training | |
| ```bash | |
| # Start training with all parameters | |
| python train.py config/train_smollm3_dpo_6epochs.py \ | |
| --dataset_dir smoltalk_dataset \ | |
| --out_dir /output-checkpoint \ | |
| --init_from scratch \ | |
| --max_iters $MAX_STEPS \ | |
| --batch_size $BATCH_SIZE \ | |
| --learning_rate 5e-6 \ | |
| --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ | |
| --max_seq_length 4096 \ | |
| --save_steps 500 \ | |
| --eval_steps 100 \ | |
| --logging_steps 10 \ | |
| --enable_tracking \ | |
| --trackio_url "https://your-trackio-space.hf.space" \ | |
| --experiment_name "smollm3_dpo_6epochs" | |
| ``` | |
| ### Step 10: Push Model to Hugging Face Hub | |
| ```bash | |
| # Push the trained model | |
| python push_to_huggingface.py /output-checkpoint "your-username/smollm3-dpo-6epochs" \ | |
| --token "$HF_TOKEN" \ | |
| --trackio-url "https://your-trackio-space.hf.space" \ | |
| --experiment-name "smollm3_dpo_6epochs" | |
| ``` | |
| ### Step 11: Test the Uploaded Model | |
| ```bash | |
| # Test the model | |
| python -c " | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| print('Loading uploaded model...') | |
| model = AutoModelForCausalLM.from_pretrained('your-username/smollm3-dpo-6epochs', torch_dtype=torch.float16, device_map='auto') | |
| tokenizer = AutoTokenizer.from_pretrained('your-username/smollm3-dpo-6epochs') | |
| print('Testing model generation...') | |
| prompt = 'Hello, how are you?' | |
| inputs = tokenizer(prompt, return_tensors='pt').to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| print(f'Prompt: {prompt}') | |
| print(f'Response: {response}') | |
| print('✅ Model test completed successfully!') | |
| " | |
| ``` | |
| ## Complete One-Line Deployment | |
| If you want to run everything automatically, use the deployment script: | |
| ```bash | |
| # Make script executable | |
| chmod +x cloud_deployment.sh | |
| # Edit configuration in the script first | |
| nano cloud_deployment.sh | |
| # Change these variables: | |
| # - REPO_NAME="your-username/smollm3-dpo-6epochs" | |
| # - TRACKIO_URL="https://your-trackio-space.hf.space" | |
| # - HF_TOKEN="your_hf_token_here" | |
| # Run the complete deployment | |
| ./cloud_deployment.sh | |
| ``` | |
| ## Monitoring and Debugging | |
| ### Check GPU Usage | |
| ```bash | |
| # Monitor GPU usage during training | |
| watch -n 1 nvidia-smi | |
| ``` | |
| ### Check Training Logs | |
| ```bash | |
| # Monitor training progress | |
| tail -f training.log | |
| # Check system resources | |
| htop | |
| ``` | |
| ### Monitor Trackio | |
| ```bash | |
| # Check if Trackio is logging properly | |
| curl -s "https://your-trackio-space.hf.space" | grep -i "experiment" | |
| ``` | |
| ## Expected Timeline | |
| - **Setup**: 15-30 minutes | |
| - **Dataset preparation**: 5-10 minutes | |
| - **Training (6 epochs)**: 4-8 hours (depending on GPU) | |
| - **Model upload**: 10-30 minutes | |
| - **Testing**: 5-10 minutes | |
| ## Troubleshooting | |
| ### Common Issues | |
| #### 1. Out of Memory (OOM) | |
| ```bash | |
| # Reduce batch size | |
| BATCH_SIZE=1 | |
| GRADIENT_ACCUMULATION_STEPS=16 | |
| # Or use gradient checkpointing | |
| # Already enabled in config | |
| ``` | |
| #### 2. Slow Training | |
| ```bash | |
| # Check GPU utilization | |
| nvidia-smi | |
| # Check if mixed precision is working | |
| # Look for "fp16" in training logs | |
| ``` | |
| #### 3. Dataset Issues | |
| ```bash | |
| # Check dataset format | |
| head -n 5 smoltalk_dataset/train.json | |
| # Verify dataset size | |
| wc -l smoltalk_dataset/train.json | |
| ``` | |
| #### 4. Authentication Issues | |
| ```bash | |
| # Test HF token | |
| python -c " | |
| from huggingface_hub import HfApi | |
| api = HfApi(token='$HF_TOKEN') | |
| print('Token is valid!') | |
| " | |
| ``` | |
| ## Cost Estimation | |
| ### AWS (g5.2xlarge) | |
| - **Instance**: $0.526/hour | |
| - **Training time**: 6 hours | |
| - **Total cost**: ~$3.16 | |
| ### Google Cloud (n1-standard-8 + T4) | |
| - **Instance**: $0.38/hour | |
| - **Training time**: 6 hours | |
| - **Total cost**: ~$2.28 | |
| ### Azure (Standard_NC6s_v3) | |
| - **Instance**: $0.90/hour | |
| - **Training time**: 6 hours | |
| - **Total cost**: ~$5.40 | |
| ## Next Steps | |
| After successful deployment: | |
| 1. **Monitor training** in your Trackio Space | |
| 2. **Check model repository** on Hugging Face Hub | |
| 3. **Test the model** with different prompts | |
| 4. **Share your model** with the community | |
| 5. **Iterate and improve** based on results | |
| ## Support | |
| - **Training issues**: Check logs and GPU utilization | |
| - **Upload issues**: Verify HF token and repository permissions | |
| - **Monitoring issues**: Check Trackio Space configuration | |
| - **Performance issues**: Adjust batch size and learning rate | |
| Your SmolLM3 DPO model will be ready for use after training completes! |