SnowflakeCore-G1-Tiny-Instruct

A custom GPT-style transformer language model built from scratch using PyTorch.

Model Overview

SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with ~400M parameters designed for text generation tasks.

Key Features

2048 token context window for extended conversations
Mixed precision training (BF16/FP16) for efficiency
Custom attention implementation with fused operations
Early stopping mechanisms N/A
Gradient accumulation for effective large batch training

Architecture Specifications

Component	Value
Model Type	Autoregressive Transformer
Parameters	~400M
Layers	24
Hidden Size	1024
Attention Heads	16
Head Dimension	64
FFN Dimension	4096
Context Length	2048 tokens
Vocabulary Size	50,257 (GPT-2 tokenizer)

Quick Start

Installation

pip install torch transformers # if not already installed

Basic Usage

# N/A

Training Details

Dataset

Source:

Training Configuration

Framework: PyTorch with mixed precision (BF16/FP16)
Optimizer: AdamW (learning rate: 2e-4)
Batch Size: N/A
Context Window: 2048 tokens or 512 tokens
Validation Split: N/A
Early Stopping: N/A

Performance Monitoring

Training loss tracked per epoch with perplexity calculation
Full validation after each epoch
Step-level monitoring every 500 steps
Comprehensive metrics saved in training_metrics.json

Technical Implementation

Attention Mechanism

Causal Masking: Supports autoregressive generation
Key Padding Mask: Enables batched inference
Scaled Dot-Product: Head dimension normalization included

Memory Optimization

Fused Operations: Reduces memory fragmentation
Mixed Precision: 30-40% memory reduction
Gradient Accumulation: Simulates larger batch sizes
Optional Quantization: Further model compression

Training Stability

Gradient Clipping: Prevents exploding gradients
Automatic Loss Scaling: Mixed precision stability
Early Stopping: Prevents overfitting with patience mechanisms

System Requirements

Memory Requirements

Training: 16-24GB VRAM (precision dependent)
Inference: 4-6GB VRAM for standard generation
Context: Maximum 2048 tokens input length

Generation Parameters

Default configuration:

{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}

Limitations

No HuggingFace .generate() support: Use custom generation function
Output Quality: May produce repetitive or nonsensical text for some prompts
Hardware Requirements: GPU recommended for practical inference
Context Window: Limited to 2048 tokens (or 512 tokens)

Example Output

# WIP

Support Me

You can support me via Ko-fi or you can try my Vast.ai template!

More meta-data

Release date: July 10, 2025

FlameF0X
/

SnowflakeCore-G1-Tiny-Instruct