A newer version of this model is available: FlameF0X/SnowflakeCore-G0-Release-2.5

SnowflakeCore-G0-Release-2

This is the initial release of the SnowflakeCore series language models, trained on the DialogMLM-50K dataset with optimized memory usage.

SUPPORT ME

You can support me via https://ko-fi.com/flamef0x

Model details

  • Architecture: SnowflakeCore
  • Hidden size: 768
  • Number of attention heads: 12
  • Number of layers: 8
  • Feed-forward dimension: 1536
  • Maximum sequence length: 768
  • Vocabulary size: 30522

Flowchart

image/png

HuggingFace Transformers Compatibility

This model is fully compatible with the HuggingFace Transformers library. You can load it using:

from transformers import AutoConfig, AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("FlameF0X/SnowflakeCore-G0-Release-2")
config = AutoConfig.from_pretrained("FlameF0X/SnowflakeCore-G0-Release-2")
model = AutoModel.from_pretrained("FlameF0X/SnowflakeCore-G0-Release-2")

Memory Optimization Techniques

  • Mixed precision training
  • Gradient accumulation (8 steps)
  • Fused QKV projection
  • Pre-norm architecture
  • Weight tying between embedding and output layers
  • Half-precision model storage

The model weights are stored in both PyTorch (.bin) and safetensors format for improved security, loading efficiency, and compatibility.

Training

Epoch Train Loss Val Loss
0 10.0000 10.0000
1 5.1290 4.3137
2 4.1629 3.8085
3 3.7087 3.4156
4 3.4236 3.2198
5 3.2251 3.0678
6 3.0599 2.9335
7 2.9571 2.8617
8 2.8831 2.7782
9 2.8003 2.7345
10 2.7579 2.6981
11 2.7128 2.6385
12 2.6783 2.6337
13 2.6571 2.5944
14 2.6261 2.5631
15 2.5919 2.5353
16 2.5592 2.5121
17 2.5359 2.4859
18 2.4998 2.4626
19 2.4746 2.4328
20 2.4631 2.4222
21 2.4374 2.3956
22 2.3924 2.3491
23 2.3540 2.3074
24 2.3207 2.2809
25 2.2994 2.2597
26 2.2737 2.2409
27 2.2595 2.2270
28 2.2353 2.2097
29 2.2030 2.1535
30 2.1648 2.1272
31 2.1375 2.1125
32 2.1189 2.0834
33 2.1056 2.0825
34 2.0820 2.0599
35 2.0643 2.0428
36 2.0451 2.0174
37 2.0256 2.0082
38 2.0099 1.9930
39 1.9937 1.9795
40 1.9753 1.9687

Complexity

The overall time complexity of training SnowflakeCore-G0-Release-2 falls under the O(n²) class due to the self-attention mechanism used in the transformer architecture. Here's a breakdown of the major computational costs:

  • Self-Attention: O(n² · d), where n is the sequence length (768) and d is the hidden size (768). This term dominates because each token attends to every other token.

  • Feedforward Layers: O(n · d²), with two projection layers per block.

  • Stacked Layers: Multiplied by the number of layers L = 8.

Overall per step complexity:

O(L · (n² · d + n · d²)) ≈ O(n² · d · L)

Training over the dataset for E = 40 epochs with B = 8 batch size leads to total training complexity:

O(E · (N / B) · n² · d · L) Where N is the number of training samples.

This puts SnowflakeCore-G0-Release-2 in the O(n²) class with respect to sequence length, which is a key scaling bottleneck. Optimizations such as fused QKV, gradient accumulation, and mixed-precision help reduce practical training cost.

Downloads last month
20
Safetensors
Model size
61.9M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train FlameF0X/SnowflakeCore-G0-Release-2

Space using FlameF0X/SnowflakeCore-G0-Release-2 1

Collection including FlameF0X/SnowflakeCore-G0-Release-2