--- datasets: - tatsu-lab/alpaca license: other license_name: ncsam license_link: https://github.com/FlameF0X/NCSAM language: - en base_model: - FlameF0X/SnowflakeCore-G1-Tiny pipeline_tag: text-generation library_name: transformers tags: - pre_train - costume_code --- # SnowflakeCore-G1-Tiny-Instruct A custom GPT-style transformer language model built from scratch using PyTorch. ## Model Overview SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~407M parameters** (407,334,912) designed for text generation tasks. ### Key Features - **2048 token context window** for extended conversations - **Mixed precision training** (BF16/FP16) for efficiency - **Custom attention implementation** with fused operations - **Early stopping mechanisms** N/A - **Gradient accumulation** for effective large batch training ### Architecture Specifications | Component | Value | |-----------|-------| | Model Type | Autoregressive Transformer | | Parameters | ~407M | | Layers | 24 | | Hidden Size | 1024 | | Attention Heads | 16 | | Head Dimension | 64 | | FFN Dimension | 4096 | | Context Length | 2048 tokens | | Vocabulary Size | 50,257 (GPT-2 tokenizer) | ## Quick Start ### Installation ```bash pip install torch transformers # if not already installed ``` ### Basic Usage ```python # N/A ``` ## Training Details ### Dataset - **Source**: ### Training Configuration - **Framework**: PyTorch with mixed precision (BF16/FP16) - **Optimizer**: AdamW (learning rate: 2e-4) - **Batch Size**: N/A - **Context Window**: 2048 tokens or 512 tokens - **Validation Split**: N/A - **Early Stopping**: N/A ### Performance Monitoring - Training loss tracked per epoch with perplexity calculation - Full validation after each epoch - Step-level monitoring every 500 steps - Comprehensive metrics saved in `training_metrics.json` ## Technical Implementation ### Attention Mechanism - **Causal Masking**: Supports autoregressive generation - **Key Padding Mask**: Enables batched inference - **Scaled Dot-Product**: Head dimension normalization included ### Memory Optimization - **Fused Operations**: Reduces memory fragmentation - **Mixed Precision**: 30-40% memory reduction - **Gradient Accumulation**: Simulates larger batch sizes - **Optional Quantization**: Further model compression ### Training Stability - **Gradient Clipping**: Prevents exploding gradients - **Automatic Loss Scaling**: Mixed precision stability - **Early Stopping**: Prevents overfitting with patience mechanisms ## System Requirements ### Memory Requirements - **Training**: 16-24GB VRAM (precision dependent) - **Inference**: 4-6GB VRAM for standard generation - **Context**: Maximum 2048 tokens input length ### Generation Parameters Default configuration: ```json { "do_sample": true, "temperature": 1.0, "top_p": 0.9, "top_k": 50, "max_new_tokens": 50, "pad_token_id": 50256, "eos_token_id": 50256 } ``` ## Limitations - **No HuggingFace `.generate()` support**: Use custom generation function - **Output Quality**: May produce repetitive or nonsensical text for some prompts - **Hardware Requirements**: GPU recommended for practical inference - **Context Window**: Limited to 2048 tokens (or 512 tokens) ## Example Output ``` # WIP ``` ## License & Acknowledgments - **License**: [NCSAM](https://github.com/FlameF0X/NCSAM) - **Framework**: Built using PyTorch - **Dataset**: - ## Support Me You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template! ### More meta-data - Release date: July 10 2025