Spaces:
Running
SmolLM3 End-to-End Fine-tuning Pipeline
This repository provides a complete end-to-end pipeline for fine-tuning SmolLM3 models with integrated experiment tracking, monitoring, and model deployment.
π Quick Start
1. Setup Configuration
# Run the setup script to configure with your information
python setup_launch.py
This will prompt you for:
- Your Hugging Face username
- Your Hugging Face token
- Optional model and dataset customizations
2. Check Requirements
# Verify all dependencies are installed
python check_requirements.py
3. Run the Pipeline
# Make the script executable and run
chmod +x launch.sh
./launch.sh
π What the Pipeline Does
The end-to-end pipeline performs the following steps:
1. Environment Setup
- Installs system dependencies
- Creates Python virtual environment
- Installs PyTorch with CUDA support
- Installs all required Python packages
2. Trackio Space Deployment
- Creates a new Hugging Face Space for experiment tracking
- Configures the Trackio monitoring interface
- Sets up environment variables
3. HF Dataset Setup
- Creates a Hugging Face Dataset repository for experiment storage
- Configures dataset access and permissions
- Sets up initial experiment data structure
4. Dataset Preparation
- Downloads the specified dataset from Hugging Face Hub
- Converts to training format (prompt/completion pairs)
- Handles multiple dataset formats automatically
- Creates train/validation splits
5. Training Configuration
- Creates optimized training configuration
- Sets up monitoring integration
- Configures model parameters and hyperparameters
6. Model Training
- Runs the SmolLM3 fine-tuning process
- Logs metrics to Trackio Space in real-time
- Saves experiment data to HF Dataset
- Creates checkpoints during training
7. Model Deployment
- Pushes trained model to Hugging Face Hub
- Creates comprehensive model card
- Uploads training results and logs
- Tests the uploaded model
8. Summary Report
- Generates detailed training summary
- Provides links to all resources
- Documents configuration and results
π― Features
Integrated Monitoring
- Real-time experiment tracking via Trackio Space
- Persistent storage in Hugging Face Datasets
- Comprehensive metrics logging
- System resource monitoring
Flexible Dataset Support
- Automatic format detection and conversion
- Support for multiple dataset types
- Built-in data preprocessing
- Train/validation split handling
Optimized Training
- Flash Attention support for efficiency
- Gradient checkpointing for memory optimization
- Mixed precision training
- Automatic hyperparameter optimization
Complete Deployment
- Automated model upload to Hugging Face Hub
- Comprehensive model cards
- Training results documentation
- Model testing and validation
π Monitoring & Tracking
Trackio Space Interface
- Real-time training metrics visualization
- Experiment management and comparison
- System resource monitoring
- Training progress tracking
HF Dataset Storage
- Persistent experiment data storage
- Version-controlled experiment history
- Collaborative experiment sharing
- Automated data backup
π§ Configuration
Required Configuration
Update these variables in launch.sh
:
# Your Hugging Face credentials
HF_TOKEN="your_hf_token_here"
HF_USERNAME="your-username"
# Model and dataset
MODEL_NAME="HuggingFaceTB/SmolLM3-3B"
DATASET_NAME="HuggingFaceTB/smoltalk"
# Output repositories
REPO_NAME="your-username/smollm3-finetuned-$(date +%Y%m%d)"
TRACKIO_DATASET_REPO="your-username/trackio-experiments"
Training Parameters
Customize training parameters:
# Training configuration
BATCH_SIZE=2
GRADIENT_ACCUMULATION_STEPS=8
LEARNING_RATE=5e-6
MAX_EPOCHS=3
MAX_SEQ_LENGTH=4096
π Output Structure
After running the pipeline, you'll have:
βββ training_dataset/ # Prepared dataset
β βββ train.json
β βββ validation.json
βββ /output-checkpoint/ # Model checkpoints
β βββ config.json
β βββ pytorch_model.bin
β βββ training_results/
βββ training.log # Training logs
βββ training_summary.md # Summary report
βββ config/train_smollm3_end_to_end.py # Training config
π Online Resources
The pipeline creates these online resources:
- Model Repository:
https://huggingface.co/your-username/smollm3-finetuned-YYYYMMDD
- Trackio Space:
https://huggingface.co/spaces/your-username/trackio-monitoring-YYYYMMDD
- Experiment Dataset:
https://huggingface.co/datasets/your-username/trackio-experiments
π οΈ Troubleshooting
Common Issues
HF Token Issues
# Verify your token is correct huggingface-cli whoami
CUDA Issues
# Check CUDA availability python -c "import torch; print(torch.cuda.is_available())"
Memory Issues
# Reduce batch size or gradient accumulation BATCH_SIZE=1 GRADIENT_ACCUMULATION_STEPS=16
Dataset Issues
# Test dataset access python -c "from datasets import load_dataset; print(load_dataset('your-dataset'))"
Debug Mode
Run individual components for debugging:
# Test Trackio deployment
cd scripts/trackio_tonic
python deploy_trackio_space.py
# Test dataset setup
cd scripts/dataset_tonic
python setup_hf_dataset.py
# Test training
python src/train.py config/train_smollm3_end_to_end.py --help
π Advanced Usage
Custom Datasets
For custom datasets, ensure they have one of these formats:
// Format 1: Prompt/Completion
{
"prompt": "What is machine learning?",
"completion": "Machine learning is..."
}
// Format 2: Instruction/Output
{
"instruction": "Explain machine learning",
"output": "Machine learning is..."
}
// Format 3: Chat format
{
"messages": [
{"role": "user", "content": "What is ML?"},
{"role": "assistant", "content": "ML is..."}
]
}
Custom Models
To use different models, update the configuration:
MODEL_NAME="microsoft/DialoGPT-medium"
MAX_SEQ_LENGTH=1024
Custom Training
Modify training parameters in the generated config:
# In config/train_smollm3_end_to_end.py
config = SmolLM3Config(
learning_rate=1e-5, # Custom learning rate
max_iters=5000, # Custom training steps
# ... other parameters
)
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test the pipeline
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Hugging Face for the excellent transformers library
- The SmolLM3 team for the base model
- The Trackio team for experiment tracking
- The open-source community for contributions
π Support
For issues and questions:
- Check the troubleshooting section
- Review the logs in
training.log
- Check the Trackio Space for monitoring data
- Open an issue on GitHub
Happy Fine-tuning! π