Outlier-Safe Pre-Training (OSP)
A collection of ablation and final models trained on the Outlier-Safe Pre-Training (OSP) framework.
- Paper • 2506.19697 • Published • 44
dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj
1B • Updated • 11 • 3Note Trained on the OSP framework. This is our final model.
dmis-lab/OSP-1.4B-1T-Adam
1B • Updated • 10 • 3Note Trained on the standard Adam optimizer, without any modifications.
dmis-lab/OSP-1.4B-100B-Adam
1B • Updated • 10 • 3Note Ablation trained on the standard Adam optimizer, without any modifications. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon
1B • Updated • 8 • 3Note Ablation trained on the Muon optimizer, without any modifications. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-Only
1B • Updated • 8 • 3Note Ablation trained on the Muon optimizer, without any modifications. No Adam optimizer at embeddings. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-SSNorm
1B • Updated • 8 • 3Note Ablation trained on the Muon optimizer, with Single-Scale RMSNorm. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-EmbProj
1B • Updated • 8 • 3Note Ablation trained on the Muon optimizer, with Embedding Projection matrix. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-SSNorm-EmbProj
1B • Updated • 10 • 3Note Ablation trained on the OSP framework. [100B tokens]
dmis-lab/OSP-1.4B-100B-Shampoo-SSNorm
1B • Updated • 9 • 3Note Ablation trained on the Shampoo optimizer, with Single-Scale RMSNorm. [100B tokens]
dmis-lab/OSP-1.4B-100B-Shampoo-SSNorm-EmbProj
1B • Updated • 8 • 3Note Ablation trained on the Shampoo optimizer, with OSP's other components. [100B tokens]