SnowflakeCore-G1-Tiny2

A improve version of SnowflakeCore-G1-Tiny custom GPT-style transformer language model built from scratch using PyTorch, trained on the common-pile/wikimedia_filtered dataset.

Model Overview

SnowflakeCore-G1-Tiny2 is a GPT-style autoregressive transformer model with ~400M parameters designed for text generation tasks.

Key Features

  • 2048 token context window for extended conversations
  • Mixed precision training (BF16/FP16) for efficiency
  • Custom attention implementation with fused operations
  • Early stopping mechanisms for optimal training
  • Gradient accumulation for effective large batch training

Architecture Specifications

Component Value
Model Type Autoregressive Transformer
Parameters ~400M
Layers 24
Hidden Size 1024
Attention Heads 16
Head Dimension 64
FFN Dimension 4096
Context Length 2048 tokens
Vocabulary Size 50,257 (GPT-2 tokenizer)

Model Benchmarks

The following benchmarks compare SnowflakeCore-G1-Tiny2, its predecessor, and GPT-2 on key performance and text quality metrics.

Performance & Quality Metrics

Model Params Size (MB) Speed (tok/s) Vocab Div. Dist. Bigrams Dist. Trigrams Bigram Repet. Trigram Repet.
SnowflakeCore-G1-Tiny2 355.9M 1357.54 22.13 0.3440 0.7408 0.8834 0.2592 0.1166
SnowflakeCore-G1-Tiny 355.9M 1357.54 22.12 0.2780 0.6111 0.7421 0.3889 0.2579
GPT-2 (small) 124.4M 474.70 47.73 0.2590 0.6408 0.7946 0.3592 0.2054

Notes:

  • Vocabulary Diversity = unique tokens / total tokens
  • Distinct N-grams = unique n-grams / total n-grams
  • Lower repetition rates indicate better text novelty

Memory Usage (CPU)

All models report N/A for CPU memory usage across all sequence lengths.

Sequence Length SnowflakeCore-G1-Tiny SnowflakeCore-G1-Tiny2 GPT-2
128 N/A (CPU) N/A (CPU) N/A
512 N/A (CPU) N/A (CPU) N/A
1024 N/A (CPU) N/A (CPU) N/A
2048 N/A (CPU) N/A (CPU) N/A

Quick Start

Installation

pip install torch transformers # if not already installed

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny2",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny2",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)

def custom_greedy_generate(prompt, max_length=50):
    model.eval()
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    generated = input_ids

    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids=generated)
            next_token_logits = outputs["logits"][:, -1, :]
            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
            generated = torch.cat((generated, next_token_id), dim=1)

            if next_token_id.item() == tokenizer.eos_token_id:
                break

    return tokenizer.decode(generated[0], skip_special_tokens=True)

# Generate text
prompt = "Once upon a time"
result = custom_greedy_generate(prompt)
print(result)

Fine-Tuning

... (same fine-tuning code as above) ...

Training Details

Dataset

Training Configuration

  • Framework: PyTorch with mixed precision (BF16/FP16)
  • Optimizer: AdamW (learning rate: 2e-4)
  • Batch Size: 1 with gradient accumulation (32 steps)
  • Context Window: 2048 tokens
  • Validation Split: 10%
  • Early Stopping: Implemented at epoch and step levels

Performance Monitoring

  • Training loss tracked per epoch with perplexity calculation
  • Full validation after each epoch
  • Step-level monitoring every 500 steps
  • Comprehensive metrics saved in training_metrics.json

Technical Implementation

Attention Mechanism

  • Causal Masking: Supports autoregressive generation
  • Key Padding Mask: Enables batched inference
  • Scaled Dot-Product: Head dimension normalization included

Memory Optimization

  • Fused Operations: Reduces memory fragmentation
  • Mixed Precision: 30-40% memory reduction
  • Gradient Accumulation: Simulates larger batch sizes
  • Optional Quantization: Further model compression

Training Stability

  • Gradient Clipping: Prevents exploding gradients
  • Automatic Loss Scaling: Mixed precision stability
  • Early Stopping: Prevents overfitting with patience mechanisms

System Requirements

Memory Requirements

  • Training: 16-24GB VRAM (precision dependent)
  • Inference: 1-6GB VRAM for standard generation
  • Context: Maximum 2048 tokens input length

Generation Parameters

Default configuration:

{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}

Model Files

The repository contains:

  • pytorch_model.bin - PyTorch model weights
  • model.safetensors - SafeTensors format weights
  • config.json - Model configuration
  • generation_config.json - Generation parameters
  • training_metrics.json - Training statistics
  • tokenizer.json - Tokenizer configuration
  • vocab.json & merges.txt - Vocabulary files

Limitations

  • No HuggingFace .generate() support: Use custom generation function
  • Output Quality: May produce repetitive or nonsensical text for some prompts
  • Hardware Requirements: GPU recommended for practical inference
  • Context Window: Limited to 2048 tokens
  • Dataset Dependency: Performance tied to Mixture-of-Thoughts dataset quality

Example Output

N/A

Support Me

You can support me via Ko-fi or you can try my Vast.ai template!

Small meta-data

  • Release date: July 21, 2025.
Downloads last month
117
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train FlameF0X/SnowflakeCore-G1-Tiny2

Space using FlameF0X/SnowflakeCore-G1-Tiny2 1

Collection including FlameF0X/SnowflakeCore-G1-Tiny2