Cloud Deployment Guide for SmolLM3 DPO Training

This guide provides the exact sequence of commands to deploy and run SmolLM3 DPO training on a cloud computing instance with 6 epochs.

Prerequisites

Cloud Instance Requirements

GPU: NVIDIA A100, H100, or similar (16GB+ VRAM)
RAM: 64GB+ system memory
Storage: 100GB+ SSD storage
OS: Ubuntu 20.04 or 22.04

Required Information

Before starting, gather these details:

Your Hugging Face username
Your Hugging Face token (with write permissions)
Your Trackio Space URL (if using monitoring)

Step-by-Step Deployment

Step 1: Launch Cloud Instance

Choose your cloud provider and launch an instance:

AWS (g5.2xlarge or g5.4xlarge)

# Launch instance with Ubuntu 22.04 and appropriate GPU
aws ec2 run-instances \
    --image-id ami-0c7217cdde317cfec \
    --instance-type g5.2xlarge \
    --key-name your-key-pair \
    --security-group-ids sg-xxxxxxxxx

Google Cloud (n1-standard-8 with T4/V100)

gcloud compute instances create smollm3-dpo \
    --zone=us-central1-a \
    --machine-type=n1-standard-8 \
    --accelerator="type=nvidia-tesla-t4,count=1" \
    --image-family=ubuntu-2204-lts \
    --image-project=ubuntu-os-cloud

Azure (Standard_NC6s_v3)

az vm create \
    --resource-group your-rg \
    --name smollm3-dpo \
    --image Canonical:0001-com-ubuntu-server-jammy:22_04-lts:latest \
    --size Standard_NC6s_v3 \
    --admin-username azureuser

Step 2: Connect to Instance

# SSH to your instance
ssh -i your-key.pem ubuntu@your-instance-ip

# Or for Azure
ssh azureuser@your-instance-ip

Step 3: Update System and Install Dependencies

# Update system
sudo apt-get update
sudo apt-get upgrade -y

# Install system dependencies
sudo apt-get install -y git curl wget unzip python3 python3-pip python3-venv

# Install NVIDIA drivers (if not pre-installed)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Step 4: Clone Repository and Setup Environment

# Clone your repository
git clone https://github.com/your-username/flexai-finetune.git
cd flexai-finetune

# Create virtual environment
python3 -m venv smollm3_env
source smollm3_env/bin/activate

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install project dependencies
pip install -r requirements.txt

# Install additional DPO dependencies
pip install trl>=0.7.0
pip install peft>=0.4.0
pip install accelerate>=0.20.0

Step 5: Configure Authentication

# Set your Hugging Face token
export HF_TOKEN="your_huggingface_token_here"

# Login to Hugging Face
hf login --token $HF_TOKEN

Step 6: Create Configuration Files

Create the DPO configuration file:

cat > config/train_smollm3_dpo_6epochs.py << 'EOF'
"""
SmolLM3 DPO Training Configuration - 6 Epochs
Optimized for cloud deployment
"""

from config.train_smollm3_dpo import SmolLM3DPOConfig

config = SmolLM3DPOConfig(
    # Model configuration
    model_name="HuggingFaceTB/SmolLM3-3B",
    max_seq_length=4096,
    use_flash_attention=True,
    use_gradient_checkpointing=True,
    
    # Training configuration
    batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=5e-6,
    weight_decay=0.01,
    warmup_steps=100,
    max_iters=None,  # Will be calculated based on epochs
    eval_interval=100,
    log_interval=10,
    save_interval=500,
    
    # DPO configuration
    beta=0.1,
    max_prompt_length=2048,
    
    # Optimizer configuration
    optimizer="adamw",
    beta1=0.9,
    beta2=0.95,
    eps=1e-8,
    
    # Scheduler configuration
    scheduler="cosine",
    min_lr=1e-6,
    
    # Mixed precision
    fp16=True,
    bf16=False,
    
    # Logging and saving
    save_steps=500,
    eval_steps=100,
    logging_steps=10,
    save_total_limit=3,
    
    # Evaluation
    eval_strategy="steps",
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    load_best_model_at_end=True,
    
    # Data configuration
    data_dir="smoltalk_dataset",
    train_file="train.json",
    validation_file="validation.json",
    
    # Chat template configuration
    use_chat_template=True,
    chat_template_kwargs={
        "enable_thinking": False,
        "add_generation_prompt": True
    },
    
    # Trackio monitoring configuration
    enable_tracking=True,
    trackio_url="https://your-trackio-space.hf.space",  # Change this
    trackio_token=None,
    log_artifacts=True,
    log_metrics=True,
    log_config=True,
    experiment_name="smollm3_dpo_6epochs"
)
EOF

Step 7: Download and Prepare Dataset

# Create dataset preparation script
cat > prepare_dataset.py << 'EOF'
from datasets import load_dataset
import json
import os

# Load SmolTalk dataset
print('Loading SmolTalk dataset...')
dataset = load_dataset('HuggingFaceTB/smoltalk')

# Create dataset directory
os.makedirs('smoltalk_dataset', exist_ok=True)

# Convert to DPO format (preference pairs)
def convert_to_dpo_format(example):
    # For SmolTalk, we'll create preference pairs based on response quality
    # This is a simplified example - you may need to adjust based on your needs
    return {
        'prompt': example.get('prompt', ''),
        'chosen': example.get('chosen', ''),
        'rejected': example.get('rejected', '')
    }

# Process train split
train_data = []
for example in dataset['train']:
    dpo_example = convert_to_dpo_format(example)
    if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']:
        train_data.append(dpo_example)

# Process validation split
val_data = []
for example in dataset['validation']:
    dpo_example = convert_to_dpo_format(example)
    if dpo_example['prompt'] and dpo_example['chosen'] and dpo_example['rejected']:
        val_data.append(dpo_example)

# Save to files
with open('smoltalk_dataset/train.json', 'w') as f:
    json.dump(train_data, f, indent=2)

with open('smoltalk_dataset/validation.json', 'w') as f:
    json.dump(val_data, f, indent=2)

print(f'Dataset prepared: {len(train_data)} train samples, {len(val_data)} validation samples')
EOF

# Run dataset preparation
python prepare_dataset.py

Step 8: Calculate Training Parameters

# Calculate training steps based on epochs
TOTAL_SAMPLES=$(python -c "import json; data=json.load(open('smoltalk_dataset/train.json')); print(len(data))")
BATCH_SIZE=2
GRADIENT_ACCUMULATION_STEPS=8
MAX_EPOCHS=6
EFFECTIVE_BATCH_SIZE=$((BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS))
STEPS_PER_EPOCH=$((TOTAL_SAMPLES / EFFECTIVE_BATCH_SIZE))
MAX_STEPS=$((STEPS_PER_EPOCH * MAX_EPOCHS))

echo "Training Configuration:"
echo "  Total samples: $TOTAL_SAMPLES"
echo "  Effective batch size: $EFFECTIVE_BATCH_SIZE"
echo "  Steps per epoch: $STEPS_PER_EPOCH"
echo "  Total training steps: $MAX_STEPS"
echo "  Training epochs: $MAX_EPOCHS"

Step 9: Start DPO Training

# Start training with all parameters
python train.py config/train_smollm3_dpo_6epochs.py \
    --dataset_dir smoltalk_dataset \
    --out_dir /output-checkpoint \
    --init_from scratch \
    --max_iters $MAX_STEPS \
    --batch_size $BATCH_SIZE \
    --learning_rate 5e-6 \
    --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
    --max_seq_length 4096 \
    --save_steps 500 \
    --eval_steps 100 \
    --logging_steps 10 \
    --enable_tracking \
    --trackio_url "https://your-trackio-space.hf.space" \
    --experiment_name "smollm3_dpo_6epochs"

Step 10: Push Model to Hugging Face Hub

# Push the trained model
python push_to_huggingface.py /output-checkpoint "your-username/smollm3-dpo-6epochs" \
    --token "$HF_TOKEN" \
    --trackio-url "https://your-trackio-space.hf.space" \
    --experiment-name "smollm3_dpo_6epochs"

Step 11: Test the Uploaded Model

# Test the model
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

print('Loading uploaded model...')
model = AutoModelForCausalLM.from_pretrained('your-username/smollm3-dpo-6epochs', torch_dtype=torch.float16, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('your-username/smollm3-dpo-6epochs')

print('Testing model generation...')
prompt = 'Hello, how are you?'
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f'Prompt: {prompt}')
print(f'Response: {response}')
print('✅ Model test completed successfully!')
"

Complete One-Line Deployment

If you want to run everything automatically, use the deployment script:

# Make script executable
chmod +x cloud_deployment.sh

# Edit configuration in the script first
nano cloud_deployment.sh
# Change these variables:
# - REPO_NAME="your-username/smollm3-dpo-6epochs"
# - TRACKIO_URL="https://your-trackio-space.hf.space"
# - HF_TOKEN="your_hf_token_here"

# Run the complete deployment
./cloud_deployment.sh

Monitoring and Debugging

Check GPU Usage

# Monitor GPU usage during training
watch -n 1 nvidia-smi

Check Training Logs

# Monitor training progress
tail -f training.log

# Check system resources
htop

Monitor Trackio

# Check if Trackio is logging properly
curl -s "https://your-trackio-space.hf.space" | grep -i "experiment"

Expected Timeline

Setup: 15-30 minutes
Dataset preparation: 5-10 minutes
Training (6 epochs): 4-8 hours (depending on GPU)
Model upload: 10-30 minutes
Testing: 5-10 minutes

Troubleshooting

Common Issues

1. Out of Memory (OOM)

# Reduce batch size
BATCH_SIZE=1
GRADIENT_ACCUMULATION_STEPS=16

# Or use gradient checkpointing
# Already enabled in config

2. Slow Training

# Check GPU utilization
nvidia-smi

# Check if mixed precision is working
# Look for "fp16" in training logs

3. Dataset Issues

# Check dataset format
head -n 5 smoltalk_dataset/train.json

# Verify dataset size
wc -l smoltalk_dataset/train.json

4. Authentication Issues

# Test HF token
python -c "
from huggingface_hub import HfApi
api = HfApi(token='$HF_TOKEN')
print('Token is valid!')
"

Cost Estimation

AWS (g5.2xlarge)

Instance: $0.526/hour
Training time: 6 hours
Total cost: ~$3.16

Google Cloud (n1-standard-8 + T4)

Instance: $0.38/hour
Training time: 6 hours
Total cost: ~$2.28

Azure (Standard_NC6s_v3)

Instance: $0.90/hour
Training time: 6 hours
Total cost: ~$5.40

Next Steps

After successful deployment:

Monitor training in your Trackio Space
Check model repository on Hugging Face Hub
Test the model with different prompts
Share your model with the community
Iterate and improve based on results

Support

Training issues: Check logs and GPU utilization
Upload issues: Verify HF token and repository permissions
Monitoring issues: Check Trackio Space configuration
Performance issues: Adjust batch size and learning rate

Your SmolLM3 DPO model will be ready for use after training completes!