๐Ÿ”ฌ SnowflakeCore-G0-Release-3 Architecture Size Report

Summary

This document provides a detailed breakdown of the parameter count and structural design of the SnowflakeCore-G0-Release-3 model. SnowflakeCore-G0-Release-3 is a custom decoder-only transformer model built from scratch, designed for autoregressive language modeling with rotary positional embeddings (RoPE).


๐Ÿ“ Model Architecture Overview

Component Value
Architecture Type Decoder-only Transformer
Hidden Size (d_model) 1536
Number of Layers 32
Attention Heads 16
Feedforward Dim (d_ff) 6144
Max Sequence Length 2048
Positional Encoding Rotary (RoPE)
Vocabulary Size 50,000 (assumed)
Total Parameters โ‰ˆ 1.06 Billion

๐Ÿงฎ Parameter Count Breakdown

1. Embedding Layers

  • Token Embedding: V ร— d = 50,000 ร— 1536 = 76.8M
  • Output Projection: d ร— V = 1536 ร— 50,000 = 76.8M

Total:

P_embedding = 2 ยท 1536 ยท 50,000 = 153.6M

2. Transformer Blocks

Each of the 32 layers contains:

  • Multi-Head Attention (Q, K, V, Out):
    4 ยท dยฒ = 4 ยท 1536ยฒ = 9.44M

  • Feedforward Network (MLP):
    2 ยท d ยท d_ff = 2 ยท 1536 ยท 6144 = 18.87M

  • Total per Layer:

9.44M + 18.87M = 28.31M
  • Total across 32 layers:
32 ยท 28.31M = 905.97M

3. Positional Embedding

  • Type: Rotary Positional Embeddings (RoPE)
  • Parameter Count: 0 (non-learned, sinusoidal basis)

๐Ÿ“Š Final Parameter Estimate

Total Parameters โ‰ˆ P_embedding + P_transformer = 153.6M + 905.97M = 1,059.6M

๐Ÿง  Training Regime (Contextual)

Item Value
Training Dataset Size ~2 million rows
Max Tokens per Sequence 2048
Effective Batch Size 32 ร— 4 = 128
Number of Epochs 15
Optimizer AdamW
Learning Rate 3 ร— 10โปโด

Approximate number of tokens:

2M ร— avg_tokens_per_row โ‰ค 4B tokens

๐Ÿงพ Notes

  • SnowflakeCore-G0-Release-3 exceeds the size of GPT-2 Large (~774M parameters).
  • With RoPE and 32 layers, the model is well-positioned for long-range generalization.
  • This parameter size is consistent with the compute-optimal design frontier for mid-scale language models.

๐Ÿ“ฆ Conclusion

SnowflakeCore-G0-Release-3 is a rigorously engineered, 1.06B parameter language model with modern architectural choices (RoPE, deep stack, wide FFN) that position it as a strong open foundation model for further research, deployment, and extension.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support