Models with 1.0B parameters trained with 1.0B tokens. Architecture: H=2048, FFN=8192, Heads=16, Layers=16.