SnowflakeCore-G1-Tiny-Instruct
A custom GPT-style transformer language model built from scratch using PyTorch.
Model Overview
SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with ~400M parameters designed for text generation tasks.
Key Features
- 2048 token context window for extended conversations
- Mixed precision training (BF16/FP16) for efficiency
- Custom attention implementation with fused operations
- Early stopping mechanisms N/A
- Gradient accumulation for effective large batch training
Architecture Specifications
Component |
Value |
Model Type |
Autoregressive Transformer |
Parameters |
~400M |
Layers |
24 |
Hidden Size |
1024 |
Attention Heads |
16 |
Head Dimension |
64 |
FFN Dimension |
4096 |
Context Length |
2048 tokens |
Vocabulary Size |
50,257 (GPT-2 tokenizer) |
Quick Start
Installation
pip install torch transformers
Basic Usage
Training Details
Dataset
Training Configuration
- Framework: PyTorch with mixed precision (BF16/FP16)
- Optimizer: AdamW (learning rate: 2e-4)
- Batch Size: N/A
- Context Window: 2048 tokens or 512 tokens
- Validation Split: N/A
- Early Stopping: N/A
Performance Monitoring
- Training loss tracked per epoch with perplexity calculation
- Full validation after each epoch
- Step-level monitoring every 500 steps
- Comprehensive metrics saved in
training_metrics.json
Technical Implementation
Attention Mechanism
- Causal Masking: Supports autoregressive generation
- Key Padding Mask: Enables batched inference
- Scaled Dot-Product: Head dimension normalization included
Memory Optimization
- Fused Operations: Reduces memory fragmentation
- Mixed Precision: 30-40% memory reduction
- Gradient Accumulation: Simulates larger batch sizes
- Optional Quantization: Further model compression
Training Stability
- Gradient Clipping: Prevents exploding gradients
- Automatic Loss Scaling: Mixed precision stability
- Early Stopping: Prevents overfitting with patience mechanisms
System Requirements
Memory Requirements
- Training: 16-24GB VRAM (precision dependent)
- Inference: 4-6GB VRAM for standard generation
- Context: Maximum 2048 tokens input length
Generation Parameters
Default configuration:
{
"do_sample": true,
"temperature": 1.0,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 50,
"pad_token_id": 50256,
"eos_token_id": 50256
}
Limitations
- No HuggingFace
.generate()
support: Use custom generation function
- Output Quality: May produce repetitive or nonsensical text for some prompts
- Hardware Requirements: GPU recommended for practical inference
- Context Window: Limited to 2048 tokens (or 512 tokens)
Example Output
# WIP
Support Me
You can support me via Ko-fi or you can try my Vast.ai template!
More meta-data
- Release date: July 10, 2025