Outlier-Safe Pre-Training (OSP)
A collection of ablation and final models trained on the Outlier-Safe Pre-Training (OSP) framework.
- Paper • 2506.19697 • Published • 34
dmis-lab/OSP-1.4B-1T-Muon-SSNorm-EmbProj
Updated • 3Note Trained on the OSP framework. This is our final model.
dmis-lab/OSP-1.4B-1T-Adam
Updated • 3Note Trained on the standard Adam optimizer, without any modifications.
dmis-lab/OSP-1.4B-100B-Adam
Updated • 3Note Ablation trained on the standard Adam optimizer, without any modifications. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon
Updated • 3Note Ablation trained on the Muon optimizer, without any modifications. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-Only
Updated • 3Note Ablation trained on the Muon optimizer, without any modifications. No Adam optimizer at embeddings. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-SSNorm
Updated • 3Note Ablation trained on the Muon optimizer, with Single-Scale RMSNorm. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-EmbProj
Updated • 3Note Ablation trained on the Muon optimizer, with Embedding Projection matrix. [100B tokens]
dmis-lab/OSP-1.4B-100B-Muon-SSNorm-EmbProj
Updated • 3Note Ablation trained on the OSP framework. [100B tokens]
dmis-lab/OSP-1.4B-100B-Shampoo-SSNorm
Updated • 3Note Ablation trained on the Shampoo optimizer, with Single-Scale RMSNorm. [100B tokens]
dmis-lab/OSP-1.4B-100B-Shampoo-SSNorm-EmbProj
Updated • 3Note Ablation trained on the Shampoo optimizer, with OSP's other components. [100B tokens]