🔬 SnowflakeCore-G0-Release-3 Architecture Size Report

Summary

This document provides a detailed breakdown of the parameter count and structural design of the SnowflakeCore-G0-Release-3 model. SnowflakeCore-G0-Release-3 is a custom decoder-only transformer model built from scratch, designed for autoregressive language modeling with rotary positional embeddings (RoPE).

📐 Model Architecture Overview

Component	Value
Architecture Type	Decoder-only Transformer
Hidden Size (d_model)	1536
Number of Layers	32
Attention Heads	16
Feedforward Dim (d_ff)	6144
Max Sequence Length	2048
Positional Encoding	Rotary (RoPE)
Vocabulary Size	50,000 (assumed)
Total Parameters	≈ 1.06 Billion

🧮 Parameter Count Breakdown

1. Embedding Layers

Token Embedding: V × d = 50,000 × 1536 = 76.8M
Output Projection: d × V = 1536 × 50,000 = 76.8M

Total:

P_embedding = 2 · 1536 · 50,000 = 153.6M

2. Transformer Blocks

Each of the 32 layers contains:

Multi-Head Attention (Q, K, V, Out):
4 · d² = 4 · 1536² = 9.44M
Feedforward Network (MLP):
2 · d · d_ff = 2 · 1536 · 6144 = 18.87M
Total per Layer:

9.44M + 18.87M = 28.31M

Total across 32 layers:

32 · 28.31M = 905.97M

3. Positional Embedding

Type: Rotary Positional Embeddings (RoPE)
Parameter Count: 0 (non-learned, sinusoidal basis)

📊 Final Parameter Estimate

Total Parameters ≈ P_embedding + P_transformer = 153.6M + 905.97M = 1,059.6M

🧠 Training Regime (Contextual)

Item	Value
Training Dataset Size	~2 million rows
Max Tokens per Sequence	2048
Effective Batch Size	32 × 4 = 128
Number of Epochs	15
Optimizer	AdamW
Learning Rate	3 × 10⁻⁴

Approximate number of tokens:

2M × avg_tokens_per_row ≤ 4B tokens

🧾 Notes

SnowflakeCore-G0-Release-3 exceeds the size of GPT-2 Large (~774M parameters).
With RoPE and 32 layers, the model is well-positioned for long-range generalization.
This parameter size is consistent with the compute-optimal design frontier for mid-scale language models.

📦 Conclusion

SnowflakeCore-G0-Release-3 is a rigorously engineered, 1.06B parameter language model with modern architectural choices (RoPE, deep stack, wide FFN) that position it as a strong open foundation model for further research, deployment, and extension.