SmolFactory / README.md
Tonic's picture
adds sft , quantization, better readmes
40fd629 verified
|
raw
history blame
10.2 kB
# SmolLM3 Fine-tuning
This repository provides a complete setup for fine-tuning SmolLM3 models using the FlexAI console, following the nanoGPT structure but adapted for modern transformer models.
## Overview
SmolLM3 is a 3B-parameter transformer decoder model optimized for efficiency, long-context reasoning, and multilingual support. This setup allows you to fine-tune SmolLM3 for various tasks including:
- **Supervised Fine-tuning (SFT)**: Adapt the model for instruction following
- **Direct Preference Optimization (DPO)**: Improve model alignment
- **Long-context fine-tuning**: Support for up to 128k tokens
- **Tool calling**: Fine-tune for function calling capabilities
- **Model Quantization**: Create int8 (GPU) and int4 (CPU) quantized versions
## Quick Start
### 1. Repository Setup
The repository follows the FlexAI console structure with the following key files:
- `train.py`: Main entry point script
- `config/train_smollm3.py`: Default configuration
- `model.py`: Model wrapper and loading
- `data.py`: Dataset handling and preprocessing
- `trainer.py`: Training loop and trainer setup
- `requirements.txt`: Dependencies
### 2. FlexAI Console Configuration
When setting up a Fine Tuning Job in the FlexAI console, use these settings:
#### Basic Configuration
- **Name**: `smollm3-finetune`
- **Cluster**: Your organization's designated cluster
- **Checkpoint**: (Optional) Previous training job checkpoint
- **Node Count**: 1
- **Accelerator Count**: 1-8 (depending on your needs)
#### Repository Settings
- **Repository URL**: `https://github.com/your-username/flexai-finetune`
- **Repository Revision**: `main`
#### Dataset Configuration
- **Datasets**: Your dataset (mounted under `/input`)
- **Mount Directory**: `my_dataset`
#### Entry Point
```
train.py config/train_smollm3.py --dataset_dir=my_dataset --init_from=resume --out_dir=/input-checkpoint --max_iters=1500
```
### 3. Dataset Format
The script supports multiple dataset formats:
#### Chat Format (Recommended)
```json
[
{
"messages": [
{"role": "user", "content": "What is machine learning?"},
{"role": "assistant", "content": "Machine learning is a subset of AI..."}
]
}
]
```
#### Instruction Format
```json
[
{
"instruction": "What is machine learning?",
"output": "Machine learning is a subset of AI..."
}
]
```
#### User-Assistant Format
```json
[
{
"user": "What is machine learning?",
"assistant": "Machine learning is a subset of AI..."
}
]
```
### 4. Configuration Options
The default configuration in `config/train_smollm3.py` includes:
```python
@dataclass
class SmolLM3Config:
# Model configuration
model_name: str = "HuggingFaceTB/SmolLM3-3B"
max_seq_length: int = 4096
use_flash_attention: bool = True
# Training configuration
batch_size: int = 4
gradient_accumulation_steps: int = 4
learning_rate: float = 2e-5
max_iters: int = 1000
# Mixed precision
fp16: bool = True
bf16: bool = False
```
### 5. Command Line Arguments
The `train.py` script accepts various arguments:
```bash
# Basic usage
python train.py config/train_smollm3.py
# With custom parameters
python train.py config/train_smollm3.py \
--dataset_dir=my_dataset \
--out_dir=/output-checkpoint \
--init_from=resume \
--max_iters=1500 \
--batch_size=8 \
--learning_rate=1e-5 \
--max_seq_length=8192
```
## Advanced Usage
### 1. Custom Configuration
Create a custom configuration file:
```python
# config/my_config.py
from config.train_smollm3 import SmolLM3Config
config = SmolLM3Config(
model_name="HuggingFaceTB/SmolLM3-3B-Instruct",
max_seq_length=8192,
batch_size=2,
learning_rate=1e-5,
max_iters=2000,
use_flash_attention=True,
fp16=True
)
```
### 2. Long-Context Fine-tuning
For long-context tasks (up to 128k tokens):
```python
config = SmolLM3Config(
max_seq_length=131072, # 128k tokens
model_name="HuggingFaceTB/SmolLM3-3B",
use_flash_attention=True,
gradient_checkpointing=True
)
```
### 3. DPO Training
For preference optimization, use the DPO trainer:
```python
from trainer import SmolLM3DPOTrainer
dpo_trainer = SmolLM3DPOTrainer(
model=model,
dataset=dataset,
config=config,
output_dir="./dpo-output"
)
dpo_trainer.train()
```
### 4. Tool Calling Fine-tuning
Include tool calling examples in your dataset:
```json
[
{
"messages": [
{"role": "user", "content": "What's the weather in New York?"},
{"role": "assistant", "content": "<tool_call>\n<invoke name=\"get_weather\">\n<parameter name=\"location\">New York</parameter>\n</invoke>\n</tool_call>"},
{"role": "tool", "content": "The weather in New York is 72Β°F and sunny."},
{"role": "assistant", "content": "The weather in New York is currently 72Β°F and sunny."}
]
}
]
```
## Model Variants
SmolLM3 comes in several variants:
- **SmolLM3-3B-Base**: Base model for general fine-tuning
- **SmolLM3-3B**: Instruction-tuned model
- **SmolLM3-3B-Instruct**: Enhanced instruction model
- **Quantized versions**: Available for deployment
## Hardware Requirements
### Minimum Requirements
- **GPU**: 16GB+ VRAM (for 3B model)
- **RAM**: 32GB+ system memory
- **Storage**: 50GB+ free space
### Recommended
- **GPU**: A100/H100 or similar
- **RAM**: 64GB+ system memory
- **Storage**: 100GB+ SSD
## Troubleshooting
### Common Issues
1. **Out of Memory (OOM)**
- Reduce `batch_size`
- Increase `gradient_accumulation_steps`
- Enable `gradient_checkpointing`
- Use `fp16` or `bf16`
2. **Slow Training**
- Enable `flash_attention`
- Use mixed precision (`fp16`/`bf16`)
- Increase `dataloader_num_workers`
3. **Dataset Loading Issues**
- Check dataset format
- Ensure proper JSON structure
- Verify file permissions
### Debug Mode
Enable debug logging:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## Evaluation
After training, evaluate your model:
```python
from transformers import pipeline
pipe = pipeline(
task="text-generation",
model="./output-checkpoint",
device=0,
max_new_tokens=256,
do_sample=True,
temperature=0.7
)
# Test the model
messages = [{"role": "user", "content": "Explain gravity in simple terms."}]
outputs = pipe(messages)
print(outputs[0]["generated_text"][-1]["content"])
```
## Model Quantization
The pipeline includes built-in quantization support using torchao for creating optimized model versions with a unified repository structure:
### Repository Structure
All models (main and quantized) are stored in a single repository:
```
your-username/model-name/
β”œβ”€β”€ README.md (unified model card)
β”œβ”€β”€ config.json
β”œβ”€β”€ pytorch_model.bin
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ int8/ (quantized model for GPU)
└── int4/ (quantized model for CPU)
```
### Quantization Types
- **int8_weight_only**: GPU optimized, ~50% memory reduction
- **int4_weight_only**: CPU optimized, ~75% memory reduction
### Automatic Quantization
When using the interactive pipeline (`launch.sh`), you'll be prompted to create quantized versions after training:
```bash
./launch.sh
# ... training completes ...
# Choose quantization options when prompted
```
### Standalone Quantization
Quantize existing models independently:
```bash
# Quantize and push to HF Hub (same repository)
python scripts/model_tonic/quantize_standalone.py /path/to/model your-username/model-name \
--quant-type int8_weight_only \
--token YOUR_HF_TOKEN
# Quantize and save locally
python scripts/model_tonic/quantize_standalone.py /path/to/model your-username/model-name \
--quant-type int4_weight_only \
--device cpu \
--save-only
```
### Loading Quantized Models
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load main model
model = AutoModelForCausalLM.from_pretrained(
"your-username/model-name",
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
# Load int8 quantized model (GPU)
model = AutoModelForCausalLM.from_pretrained(
"your-username/model-name/int8",
device_map="auto",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name/int8")
# Load int4 quantized model (CPU)
model = AutoModelForCausalLM.from_pretrained(
"your-username/model-name/int4",
device_map="cpu",
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name/int4")
```
For detailed quantization documentation, see [QUANTIZATION_GUIDE.md](docs/QUANTIZATION_GUIDE.md).
### Unified Model Cards
The system generates comprehensive model cards that include information about all model variants:
- **Single README**: One comprehensive model card for the entire repository
- **Conditional Sections**: Quantized model information appears when available
- **Usage Examples**: Complete examples for all model variants
- **Performance Information**: Memory and speed benefits for each quantization type
For detailed information about the unified model card system, see [UNIFIED_MODEL_CARD_GUIDE.md](docs/UNIFIED_MODEL_CARD_GUIDE.md).
## Deployment
### Using vLLM
```bash
vllm serve ./output-checkpoint --enable-auto-tool-choice
```
### Using llama.cpp
```bash
# Convert to GGUF format
python -m llama_cpp.convert_model ./output-checkpoint --outfile model.gguf
```
## Resources
- [SmolLM3 Blog Post](https://huggingface.co/blog/smollm3)
- [Model Repository](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
- [GitHub Repository](https://github.com/huggingface/smollm)
- [SmolTalk Dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
## License
This project follows the same license as the SmolLM3 model. Please refer to the Hugging Face model page for licensing information.
{
"id": "exp_20250718_195852",
"name": "petit-elle-l-aime-3",
"description": "SmolLM3 fine-tuning experiment",
"created_at": "2025-07-18T19:58:52.689087",
"status": "running",
"metrics": [],
"parameters": {},
"artifacts": [],
"logs": []
}