๐ฌ SnowflakeCore-G0-Release-3 Architecture Size Report
Summary
This document provides a detailed breakdown of the parameter count and structural design of the SnowflakeCore-G0-Release-3 model. SnowflakeCore-G0-Release-3 is a custom decoder-only transformer model built from scratch, designed for autoregressive language modeling with rotary positional embeddings (RoPE).
๐ Model Architecture Overview
Component | Value |
---|---|
Architecture Type | Decoder-only Transformer |
Hidden Size (d_model) | 1536 |
Number of Layers | 32 |
Attention Heads | 16 |
Feedforward Dim (d_ff) | 6144 |
Max Sequence Length | 2048 |
Positional Encoding | Rotary (RoPE) |
Vocabulary Size | 50,000 (assumed) |
Total Parameters | โ 1.06 Billion |
๐งฎ Parameter Count Breakdown
1. Embedding Layers
- Token Embedding: V ร d = 50,000 ร 1536 = 76.8M
- Output Projection: d ร V = 1536 ร 50,000 = 76.8M
Total:
P_embedding = 2 ยท 1536 ยท 50,000 = 153.6M
2. Transformer Blocks
Each of the 32 layers contains:
Multi-Head Attention (Q, K, V, Out):
4 ยท dยฒ = 4 ยท 1536ยฒ = 9.44MFeedforward Network (MLP):
2 ยท d ยท d_ff = 2 ยท 1536 ยท 6144 = 18.87MTotal per Layer:
9.44M + 18.87M = 28.31M
- Total across 32 layers:
32 ยท 28.31M = 905.97M
3. Positional Embedding
- Type: Rotary Positional Embeddings (RoPE)
- Parameter Count: 0 (non-learned, sinusoidal basis)
๐ Final Parameter Estimate
Total Parameters โ P_embedding + P_transformer = 153.6M + 905.97M = 1,059.6M
๐ง Training Regime (Contextual)
Item | Value |
---|---|
Training Dataset Size | ~2 million rows |
Max Tokens per Sequence | 2048 |
Effective Batch Size | 32 ร 4 = 128 |
Number of Epochs | 15 |
Optimizer | AdamW |
Learning Rate | 3 ร 10โปโด |
Approximate number of tokens:
2M ร avg_tokens_per_row โค 4B tokens
๐งพ Notes
- SnowflakeCore-G0-Release-3 exceeds the size of GPT-2 Large (~774M parameters).
- With RoPE and 32 layers, the model is well-positioned for long-range generalization.
- This parameter size is consistent with the compute-optimal design frontier for mid-scale language models.
๐ฆ Conclusion
SnowflakeCore-G0-Release-3 is a rigorously engineered, 1.06B parameter language model with modern architectural choices (RoPE, deep stack, wide FFN) that position it as a strong open foundation model for further research, deployment, and extension.