Spaces:
Running
Model Recovery and Deployment Guide
This guide will help you recover your trained model from the cloud instance and deploy it to Hugging Face Hub with quantization.
Prerequisites
- Hugging Face Token: You need a Hugging Face token with write permissions
- Cloud Instance Access: SSH access to your cloud instance
- Model Files: Your trained model should be in
/output-checkpoint/
on the cloud instance
Step 1: Connect to Your Cloud Instance
ssh root@your-cloud-instance-ip
cd ~/smollm3_finetune
Step 2: Set Your Hugging Face Token
export HF_TOKEN=your_huggingface_token_here
Replace your_huggingface_token_here
with your actual Hugging Face token.
Step 3: Verify Model Files
Check that your model files exist:
ls -la /output-checkpoint/
You should see files like:
config.json
model.safetensors.index.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
tokenizer.json
tokenizer_config.json
Step 4: Update Configuration
Edit the deployment script to use your Hugging Face username:
nano cloud_deploy.py
Change this line:
REPO_NAME = "your-username/smollm3-finetuned" # Change to your HF username and desired repo name
To your actual username, for example:
REPO_NAME = "tonic/smollm3-finetuned"
Step 5: Run the Deployment
Execute the deployment script:
python3 cloud_deploy.py
This will:
- β Validate your model files
- β Install required dependencies (torchao, huggingface_hub)
- β Push the main model to Hugging Face Hub
- β Create quantized versions (int8 and int4)
- β Push quantized models to subdirectories
Step 6: Verify Deployment
After successful deployment, you can verify:
- Main Model: https://huggingface.co/your-username/smollm3-finetuned
- int8 Quantized: https://huggingface.co/your-username/smollm3-finetuned/int8
- int4 Quantized: https://huggingface.co/your-username/smollm3-finetuned/int4
Alternative: Manual Deployment
If you prefer to run the steps manually:
1. Push Main Model Only
python3 scripts/model_tonic/push_to_huggingface.py \
/output-checkpoint/ \
your-username/smollm3-finetuned \
--hf-token $HF_TOKEN \
--author-name "Your Name" \
--model-description "A fine-tuned SmolLM3 model for improved text generation"
2. Quantize and Push (Optional)
# int8 quantization (GPU optimized)
python3 scripts/model_tonic/quantize_model.py \
/output-checkpoint/ \
your-username/smollm3-finetuned \
--quant-type int8_weight_only \
--hf-token $HF_TOKEN
# int4 quantization (CPU optimized)
python3 scripts/model_tonic/quantize_model.py \
/output-checkpoint/ \
your-username/smollm3-finetuned \
--quant-type int4_weight_only \
--hf-token $HF_TOKEN
Troubleshooting
Common Issues
HF_TOKEN not set
export HF_TOKEN=your_token_here
Model files not found
ls -la /output-checkpoint/
Make sure the training completed successfully.
Dependencies missing
pip install torchao huggingface_hub
Permission denied
chmod +x cloud_deploy.py chmod +x recover_model.py
Error Messages
- "Missing required model files": Check that your model training completed successfully
- "Repository creation failed": Verify your HF token has write permissions
- "Quantization failed": Check GPU memory availability or try CPU quantization
Model Usage
Once deployed, you can use your model:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Main model
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned")
# int8 quantized (GPU optimized)
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int8")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int8")
# int4 quantized (CPU optimized)
model = AutoModelForCausalLM.from_pretrained("your-username/smollm3-finetuned/int4")
tokenizer = AutoTokenizer.from_pretrained("your-username/smollm3-finetuned/int4")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
File Structure
After deployment, your repository will have:
your-username/smollm3-finetuned/
βββ README.md (model card)
βββ config.json
βββ model.safetensors.index.json
βββ model-00001-of-00002.safetensors
βββ model-00002-of-00002.safetensors
βββ tokenizer.json
βββ tokenizer_config.json
βββ int8/ (quantized model for GPU)
β βββ README.md
β βββ config.json
β βββ pytorch_model.bin
βββ int4/ (quantized model for CPU)
βββ README.md
βββ config.json
βββ pytorch_model.bin
Success Indicators
β Successful deployment shows:
- "Model recovery and deployment completed successfully!"
- "View your model at: https://huggingface.co/your-username/smollm3-finetuned"
- No error messages in the output
β Failed deployment shows:
- Error messages about missing files or permissions
- "Model recovery and deployment failed!"
Next Steps
After successful deployment:
- Test your model on Hugging Face Hub
- Share your model with the community
- Monitor usage through Hugging Face analytics
- Consider fine-tuning further based on feedback
Support
If you encounter issues:
- Check the error messages carefully
- Verify your HF token permissions
- Ensure all model files are present
- Try running individual steps manually
- Check the logs for detailed error information
Happy deploying! π