File size: 10,164 Bytes
ebe598e
d8dd7a1
 
 
 
 
 
 
 
 
 
 
40fd629
d8dd7a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40fd629
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8dd7a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fe83da
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
# SmolLM3 Fine-tuning

This repository provides a complete setup for fine-tuning SmolLM3 models using the FlexAI console, following the nanoGPT structure but adapted for modern transformer models.

## Overview

SmolLM3 is a 3B-parameter transformer decoder model optimized for efficiency, long-context reasoning, and multilingual support. This setup allows you to fine-tune SmolLM3 for various tasks including:

- **Supervised Fine-tuning (SFT)**: Adapt the model for instruction following
- **Direct Preference Optimization (DPO)**: Improve model alignment
- **Long-context fine-tuning**: Support for up to 128k tokens
- **Tool calling**: Fine-tune for function calling capabilities
- **Model Quantization**: Create int8 (GPU) and int4 (CPU) quantized versions

## Quick Start

### 1. Repository Setup

The repository follows the FlexAI console structure with the following key files:

- `train.py`: Main entry point script
- `config/train_smollm3.py`: Default configuration
- `model.py`: Model wrapper and loading
- `data.py`: Dataset handling and preprocessing
- `trainer.py`: Training loop and trainer setup
- `requirements.txt`: Dependencies

### 2. FlexAI Console Configuration

When setting up a Fine Tuning Job in the FlexAI console, use these settings:

#### Basic Configuration
- **Name**: `smollm3-finetune`
- **Cluster**: Your organization's designated cluster
- **Checkpoint**: (Optional) Previous training job checkpoint
- **Node Count**: 1
- **Accelerator Count**: 1-8 (depending on your needs)

#### Repository Settings
- **Repository URL**: `https://github.com/your-username/flexai-finetune`
- **Repository Revision**: `main`

#### Dataset Configuration
- **Datasets**: Your dataset (mounted under `/input`)
- **Mount Directory**: `my_dataset`

#### Entry Point
```
train.py config/train_smollm3.py --dataset_dir=my_dataset --init_from=resume --out_dir=/input-checkpoint --max_iters=1500
```

### 3. Dataset Format

The script supports multiple dataset formats:

#### Chat Format (Recommended)
```json
[
  {
    "messages": [
      {"role": "user", "content": "What is machine learning?"},
      {"role": "assistant", "content": "Machine learning is a subset of AI..."}
    ]
  }
]
```

#### Instruction Format
```json
[
  {
    "instruction": "What is machine learning?",
    "output": "Machine learning is a subset of AI..."
  }
]
```

#### User-Assistant Format
```json
[
  {
    "user": "What is machine learning?",
    "assistant": "Machine learning is a subset of AI..."
  }
]
```

### 4. Configuration Options

The default configuration in `config/train_smollm3.py` includes:

```python
@dataclass
class SmolLM3Config:
    # Model configuration
    model_name: str = "HuggingFaceTB/SmolLM3-3B"
    max_seq_length: int = 4096
    use_flash_attention: bool = True
    
    # Training configuration
    batch_size: int = 4
    gradient_accumulation_steps: int = 4
    learning_rate: float = 2e-5
    max_iters: int = 1000
    
    # Mixed precision
    fp16: bool = True
    bf16: bool = False
```

### 5. Command Line Arguments

The `train.py` script accepts various arguments:

```bash
# Basic usage
python train.py config/train_smollm3.py

# With custom parameters
python train.py config/train_smollm3.py \
    --dataset_dir=my_dataset \
    --out_dir=/output-checkpoint \
    --init_from=resume \
    --max_iters=1500 \
    --batch_size=8 \
    --learning_rate=1e-5 \
    --max_seq_length=8192
```

## Advanced Usage

### 1. Custom Configuration

Create a custom configuration file:

```python
# config/my_config.py
from config.train_smollm3 import SmolLM3Config

config = SmolLM3Config(
    model_name="HuggingFaceTB/SmolLM3-3B-Instruct",
    max_seq_length=8192,
    batch_size=2,
    learning_rate=1e-5,
    max_iters=2000,
    use_flash_attention=True,
    fp16=True
)
```

### 2. Long-Context Fine-tuning

For long-context tasks (up to 128k tokens):

```python
config = SmolLM3Config(
    max_seq_length=131072,  # 128k tokens
    model_name="HuggingFaceTB/SmolLM3-3B",
    use_flash_attention=True,
    gradient_checkpointing=True
)
```

### 3. DPO Training

For preference optimization, use the DPO trainer:

```python
from trainer import SmolLM3DPOTrainer

dpo_trainer = SmolLM3DPOTrainer(
    model=model,
    dataset=dataset,
    config=config,
    output_dir="./dpo-output"
)

dpo_trainer.train()
```

### 4. Tool Calling Fine-tuning

Include tool calling examples in your dataset:

```json
[
  {
    "messages": [
      {"role": "user", "content": "What's the weather in New York?"},
      {"role": "assistant", "content": "<tool_call>\n<invoke name=\"get_weather\">\n<parameter name=\"location\">New York</parameter>\n</invoke>\n</tool_call>"},
      {"role": "tool", "content": "The weather in New York is 72Β°F and sunny."},
      {"role": "assistant", "content": "The weather in New York is currently 72Β°F and sunny."}
    ]
  }
]
```

## Model Variants

SmolLM3 comes in several variants:

- **SmolLM3-3B-Base**: Base model for general fine-tuning
- **SmolLM3-3B**: Instruction-tuned model
- **SmolLM3-3B-Instruct**: Enhanced instruction model
- **Quantized versions**: Available for deployment

## Hardware Requirements

### Minimum Requirements
- **GPU**: 16GB+ VRAM (for 3B model)
- **RAM**: 32GB+ system memory
- **Storage**: 50GB+ free space

### Recommended
- **GPU**: A100/H100 or similar
- **RAM**: 64GB+ system memory
- **Storage**: 100GB+ SSD

## Troubleshooting

### Common Issues

1. **Out of Memory (OOM)**
   - Reduce `batch_size`
   - Increase `gradient_accumulation_steps`
   - Enable `gradient_checkpointing`
   - Use `fp16` or `bf16`

2. **Slow Training**
   - Enable `flash_attention`
   - Use mixed precision (`fp16`/`bf16`)
   - Increase `dataloader_num_workers`

3. **Dataset Loading Issues**
   - Check dataset format
   - Ensure proper JSON structure
   - Verify file permissions

### Debug Mode

Enable debug logging:

```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## Evaluation

After training, evaluate your model:

```python
from transformers import pipeline

pipe = pipeline(
    task="text-generation",
    model="./output-checkpoint",
    device=0,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7
)

# Test the model
messages = [{"role": "user", "content": "Explain gravity in simple terms."}]
outputs = pipe(messages)
print(outputs[0]["generated_text"][-1]["content"])
```

## Model Quantization

The pipeline includes built-in quantization support using torchao for creating optimized model versions with a unified repository structure:

### Repository Structure

All models (main and quantized) are stored in a single repository:

```
your-username/model-name/
β”œβ”€β”€ README.md (unified model card)
β”œβ”€β”€ config.json
β”œβ”€β”€ pytorch_model.bin
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ int8/ (quantized model for GPU)
└── int4/ (quantized model for CPU)
```

### Quantization Types

- **int8_weight_only**: GPU optimized, ~50% memory reduction
- **int4_weight_only**: CPU optimized, ~75% memory reduction

### Automatic Quantization

When using the interactive pipeline (`launch.sh`), you'll be prompted to create quantized versions after training:

```bash
./launch.sh
# ... training completes ...
# Choose quantization options when prompted
```

### Standalone Quantization

Quantize existing models independently:

```bash
# Quantize and push to HF Hub (same repository)
python scripts/model_tonic/quantize_standalone.py /path/to/model your-username/model-name \
    --quant-type int8_weight_only \
    --token YOUR_HF_TOKEN

# Quantize and save locally
python scripts/model_tonic/quantize_standalone.py /path/to/model your-username/model-name \
    --quant-type int4_weight_only \
    --device cpu \
    --save-only
```

### Loading Quantized Models

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load main model
model = AutoModelForCausalLM.from_pretrained(
    "your-username/model-name",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")

# Load int8 quantized model (GPU)
model = AutoModelForCausalLM.from_pretrained(
    "your-username/model-name/int8",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name/int8")

# Load int4 quantized model (CPU)
model = AutoModelForCausalLM.from_pretrained(
    "your-username/model-name/int4",
    device_map="cpu",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name/int4")
```

For detailed quantization documentation, see [QUANTIZATION_GUIDE.md](docs/QUANTIZATION_GUIDE.md).

### Unified Model Cards

The system generates comprehensive model cards that include information about all model variants:

- **Single README**: One comprehensive model card for the entire repository
- **Conditional Sections**: Quantized model information appears when available
- **Usage Examples**: Complete examples for all model variants
- **Performance Information**: Memory and speed benefits for each quantization type

For detailed information about the unified model card system, see [UNIFIED_MODEL_CARD_GUIDE.md](docs/UNIFIED_MODEL_CARD_GUIDE.md).

## Deployment

### Using vLLM
```bash
vllm serve ./output-checkpoint --enable-auto-tool-choice
```

### Using llama.cpp
```bash
# Convert to GGUF format
python -m llama_cpp.convert_model ./output-checkpoint --outfile model.gguf
```

## Resources

- [SmolLM3 Blog Post](https://huggingface.co/blog/smollm3)
- [Model Repository](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
- [GitHub Repository](https://github.com/huggingface/smollm)
- [SmolTalk Dataset](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)

## License

This project follows the same license as the SmolLM3 model. Please refer to the Hugging Face model page for licensing information. 


{
  "id": "exp_20250718_195852",
  "name": "petit-elle-l-aime-3",
  "description": "SmolLM3 fine-tuning experiment",
  "created_at": "2025-07-18T19:58:52.689087",
  "status": "running",
  "metrics": [],
  "parameters": {},
  "artifacts": [],
  "logs": []
}