Model Recovery and Deployment Guide

This guide will help you recover your trained model from the cloud instance and deploy it to Hugging Face Hub with quantization.

Prerequisites

Hugging Face Token: You need a Hugging Face token with write permissions
Cloud Instance Access: SSH access to your cloud instance
Model Files: Your trained model should be in /output-checkpoint/ on the cloud instance

Step 1: Connect to Your Cloud Instance

ssh root@your-cloud-instance-ip
cd ~/smollm3_finetune

Step 2: Set Your Hugging Face Token

export HF_TOKEN=your_huggingface_token_here

Replace your_huggingface_token_here with your actual Hugging Face token.

Step 3: Verify Model Files

Check that your model files exist:

ls -la /output-checkpoint/

You should see files like:

config.json
model.safetensors.index.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
tokenizer.json
tokenizer_config.json

Step 4: Update Configuration

Edit the deployment script to use your Hugging Face username:

nano cloud_deploy.py

Change this line:

REPO_NAME = "your-username/smollm3-finetuned"  # Change to your HF username and desired repo name

To your actual username, for example:

REPO_NAME = "tonic/smollm3-finetuned"

Step 5: Run the Deployment

Execute the deployment script:

python3 cloud_deploy.py

This will:

✅ Validate your model files
✅ Install required dependencies (torchao, huggingface_hub)
✅ Push the main model to Hugging Face Hub
✅ Create quantized versions (int8 and int4)
✅ Push quantized models to subdirectories

Step 6: Verify Deployment

After successful deployment, you can verify:

Main Model: https://huggingface.co/your-username/smollm3-finetuned
int8 Quantized: https://huggingface.co/your-username/smollm3-finetuned/int8
int4 Quantized: https://huggingface.co/your-username/smollm3-finetuned/int4

Alternative: Manual Deployment

If you prefer to run the steps manually:

1. Push Main Model Only

python3 scripts/model_tonic/push_to_huggingface.py \
    /output-checkpoint/ \
    your-username/smollm3-finetuned \
    --hf-token $HF_TOKEN \
    --author-name "Your Name" \
    --model-description "A fine-tuned SmolLM3 model for improved text generation"

2. Quantize and Push (Optional)

# int8 quantization (GPU optimized)
python3 scripts/model_tonic/quantize_model.py \
    /output-checkpoint/ \
    your-username/smollm3-finetuned \
    --quant-type int8_weight_only \
    --hf-token $HF_TOKEN

# int4 quantization (CPU optimized)
python3 scripts/model_tonic/quantize_model.py \
    /output-checkpoint/ \
    your-username/smollm3-finetuned \
    --quant-type int4_weight_only \
    --hf-token $HF_TOKEN

Troubleshooting

Common Issues

HF_TOKEN not set
```
export HF_TOKEN=your_token_here
```
Model files not found
```
ls -la /output-checkpoint/
```
Make sure the training completed successfully.
Dependencies missing
```
pip install torchao huggingface_hub
```

Permission denied

chmod +x cloud_deploy.py
chmod +x recover_model.py

Error Messages

"Missing required model files": Check that your model training completed successfully
"Repository creation failed": Verify your HF token has write permissions
"Quantization failed": Check GPU memory availability or try CPU quantization

Model Usage

Once deployed, you can use your model:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Main model
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned")

# int8 quantized (GPU optimized)
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int8")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int8")

# int4 quantized (CPU optimized)
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int4")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int4")

# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

File Structure

After deployment, your repository will have:

your-username/smollm3-finetuned/
├── README.md (model card)
├── config.json
├── model.safetensors.index.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── tokenizer.json
├── tokenizer_config.json
├── int8/ (quantized model for GPU)
│   ├── README.md
│   ├── config.json
│   └── pytorch_model.bin
└── int4/ (quantized model for CPU)
    ├── README.md
    ├── config.json
    └── pytorch_model.bin

Success Indicators

✅ Successful deployment shows:

"Model recovery and deployment completed successfully!"
"View your model at: https://huggingface.co/your-username/smollm3-finetuned"
No error messages in the output

❌ Failed deployment shows:

Error messages about missing files or permissions
"Model recovery and deployment failed!"

Next Steps

After successful deployment:

Test your model on Hugging Face Hub
Share your model with the community
Monitor usage through Hugging Face analytics
Consider fine-tuning further based on feedback

Support

If you encounter issues:

Check the error messages carefully
Verify your HF token permissions
Ensure all model files are present
Try running individual steps manually
Check the logs for detailed error information

Happy deploying! 🚀