π§ Model Card: Sam-2.5-2
Overview
Sam-2.5-2 is a fine-tuned variant of Sam2.5, optimized for chain-of-thought reasoning on GSM8K. It retains modular, ablation-ready architecture and demonstrates strong generalization across arithmetic and logic-heavy prompts.
π§ Architecture
Component |
Value |
Base Model |
Sam2.5 |
Layers |
Unchanged |
Heads |
Unchanged |
FF Multiplier |
Unchanged |
Dropout |
Unchanged |
Tokenizer |
AutoTokenizer |
Shared Weights |
lm_head β embed (cloned during save) |
π§ͺ Training Details
Parameter |
Value |
Dataset |
GSM8K |
Epochs |
2 |
Batch Size |
2 |
Max Length |
512 |
Optimizer |
AdamW |
Learning Rate |
1e-4 |
Replay Mixing |
None |
Early Stopping |
Manual checkpointing |
π Performance Metrics
Metric |
Epoch 1 |
Epoch 2 |
Final Train Loss |
0.7826 |
2.7956 |
Validation Loss |
2.5932 |
1.8989 |
Perplexity |
13.37 |
6.68 |
π Output Quality
- β
Fluent chain-of-thought steps
- β
Accurate arithmetic reasoning
- β
Consistent use of scratchpad format (
<<...>>
)
- β
Stable token alignment across nested logic
πΎ Checkpointing
- Safe save logic applied to avoid shared memory errors
- Format:
.safetensors
- Best model:
checkpoints/epoch_2_loss_1.8989/
- Final model:
checkpoints/final/