🧠 Model Card: Sam-2.5-4 (Pro Version)

📌 Overview

Sam-2.5-4 is the Pro continuation of the Sam-2.5 architecture series, designed for modular, multi-domain reasoning across math, dialogue, code, and open-domain tasks. It builds directly on Sam-2.5-3, continuing training for four additional epochs to deepen convergence, reduce domain bias, and improve generalization.

This model is optimized for transparency, ablation-readiness, and deployment on both high-resource and low-resource devices (including Raspberry Pi).

🧬 Model Lineage

Version	Description
Sam-2.5-2	GSM8K-heavy fine-tune; overfit to math; lacked domain balance
Sam-2.5-3	Emergency patch; retrained from scratch on 4 datasets; balanced capabilities
Sam-2.5-4	Pro version; continued training for 4 epochs; refined convergence and fluency

🧠 Architecture

Transformer-based, modular design
Registry-driven domain tagging and ablation toggles
Shape-adaptive loss functions with domain-aware diagnostics
Quantization-ready for Pi deployment
Verbose logging for batch-level feedback and anomaly tracing
Memory-safe serialization via safetensors

📚 Training Datasets

Dataset	Domain Focus
GSM8K	Mathematical reasoning
MultiWOZ	Multi-turn dialogue & task flow
Alpaca-Code-Cleaned	Code generation & logic
UltraChat-200k	Open-domain conversation

Datasets were concatenated, shuffled, and tagged for domain awareness
Replay and mixing strategies used to balance underrepresented domains
Training spanned 9 total epochs (5 in -3, 4 in -4)

📈 Performance Summary

Metric	Value (Epoch 8–9)
Validation Loss	↓ 2.95 (avg across domains)
Max Domain Loss	< 3.4 (no domain exceeded)
Math Bias	Resolved (loss spikes absorbed)
Dialogue Coherence	Improved (MultiWOZ eval)
Code Determinism	Increased (Alpaca eval)
Open-Domain Fluency	Fewer hallucinations, better grounding

🧪 Evaluation & Diagnostics

Loss spikes in early epochs traced to GSM8K; resolved by epoch 6
Batch-level diagnostics printed per domain and token type
Attention stability improved on long-context prompts
Token transitions cleaner across dialogue and code tasks
Validation curve shows smooth convergence post-epoch 5

🧩 Deployment Notes

Compatible with Raspberry Pi (quantized + safetensors)
Supports CLI-based training diagnostics (loss, ETA, memory)
Registry hooks enable domain-specific ablation and extension
Ideal for benchmarking on GSM8K, MultiWOZ, UltraChat, and custom blends

🤝 Intended Use

Research on modular Transformer architectures
Benchmarking across reasoning, dialogue, and code domains
Deployment on constrained hardware (e.g. Pi, ARM)
Community-driven extension and ablation testing

⚠️ Limitations

Still sensitive to prompt phrasing in edge cases
Long-context performance may degrade beyond 2k tokens
Requires domain tags for optimal generalization
Not trained on multimodal inputs (text-only)

🙌 Acknowledgments

Thanks to the open-source community, dataset curators, and contributors who helped shape Sam-2.5-4. This release reflects our shared commitment to transparent, inspectable, and extensible AI.