metadata
datasets:
- tatsu-lab/alpaca
license: other
license_name: ncsam
license_link: https://github.com/FlameF0X/NCSAM
language:
- en
base_model:
- FlameF0X/SnowflakeCore-G1-Tiny
pipeline_tag: text-generation
library_name: transformers
tags:
- pre_train
- costume_code
SnowflakeCore-G1-Tiny-Instruct
A custom GPT-style transformer language model built from scratch using PyTorch.
Model Overview
SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with ~407M parameters (407,334,912) designed for text generation tasks.
Key Features
- 2048 token context window for extended conversations
- Mixed precision training (BF16/FP16) for efficiency
- Custom attention implementation with fused operations
- Early stopping mechanisms N/A
- Gradient accumulation for effective large batch training
Architecture Specifications
Component | Value |
---|---|
Model Type | Autoregressive Transformer |
Parameters | ~407M |
Layers | 24 |
Hidden Size | 1024 |
Attention Heads | 16 |
Head Dimension | 64 |
FFN Dimension | 4096 |
Context Length | 2048 tokens |
Vocabulary Size | 50,257 (GPT-2 tokenizer) |
Quick Start
Installation
pip install torch transformers # if not already installed
Basic Usage
# N/A
Training Details
Dataset
- Source:
Training Configuration
- Framework: PyTorch with mixed precision (BF16/FP16)
- Optimizer: AdamW (learning rate: 2e-4)
- Batch Size: N/A
- Context Window: 2048 tokens or 512 tokens
- Validation Split: N/A
- Early Stopping: N/A
Performance Monitoring
- Training loss tracked per epoch with perplexity calculation
- Full validation after each epoch
- Step-level monitoring every 500 steps
- Comprehensive metrics saved in
training_metrics.json
Technical Implementation
Attention Mechanism
- Causal Masking: Supports autoregressive generation
- Key Padding Mask: Enables batched inference
- Scaled Dot-Product: Head dimension normalization included
Memory Optimization
- Fused Operations: Reduces memory fragmentation
- Mixed Precision: 30-40% memory reduction
- Gradient Accumulation: Simulates larger batch sizes
- Optional Quantization: Further model compression
Training Stability
- Gradient Clipping: Prevents exploding gradients
- Automatic Loss Scaling: Mixed precision stability
- Early Stopping: Prevents overfitting with patience mechanisms
System Requirements
Memory Requirements
- Training: 16-24GB VRAM (precision dependent)
- Inference: 4-6GB VRAM for standard generation
- Context: Maximum 2048 tokens input length
Generation Parameters
Default configuration:
{
"do_sample": true,
"temperature": 1.0,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 50,
"pad_token_id": 50256,
"eos_token_id": 50256
}
Limitations
- No HuggingFace
.generate()
support: Use custom generation function - Output Quality: May produce repetitive or nonsensical text for some prompts
- Hardware Requirements: GPU recommended for practical inference
- Context Window: Limited to 2048 tokens (or 512 tokens)
Example Output
# WIP
License & Acknowledgments
- License: NCSAM
- Framework: Built using PyTorch
- Dataset:
Support Me
You can support me via Ko-fi or you can try my Vast.ai template!
More meta-data
- Release date: July 10 2025