FlameF0X
/

SnowflakeCore-G1-Tiny-Instruct

@@ -10,7 +10,136 @@ base_model:
 - FlameF0X/SnowflakeCore-G1-Tiny
 pipeline_tag: text-generation
 library_name: transformers
 ---
 ### More meta-data
 - Release date: July 10 2025

 - FlameF0X/SnowflakeCore-G1-Tiny
 pipeline_tag: text-generation
 library_name: transformers
+tags:
+- pre_train
+- costume_code
 ---
+# SnowflakeCore-G1-Tiny-Instruct
+A custom GPT-style transformer language model built from scratch using PyTorch.
+## Model Overview
+SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~407M parameters** (407,334,912) designed for text generation tasks.
+### Key Features
+- **2048 token context window** for extended conversations
+- **Mixed precision training** (BF16/FP16) for efficiency
+- **Custom attention implementation** with fused operations
+- **Early stopping mechanisms** N/A
+- **Gradient accumulation** for effective large batch training
+### Architecture Specifications
+| Component | Value |
+|-----------|-------|
+| Model Type | Autoregressive Transformer |
+| Parameters | ~407M |
+| Layers | 24 |
+| Hidden Size | 1024 |
+| Attention Heads | 16 |
+| Head Dimension | 64 |
+| FFN Dimension | 4096 |
+| Context Length | 2048 tokens |
+| Vocabulary Size | 50,257 (GPT-2 tokenizer) |
+## Quick Start
+### Installation
+```bash
+pip install torch transformers # if not already installed
+```
+### Basic Usage
+```python
+# N/A
+```
+## Training Details
+### Dataset
+- **Source**:
+### Training Configuration
+- **Framework**: PyTorch with mixed precision (BF16/FP16)
+- **Optimizer**: AdamW (learning rate: 2e-4)
+- **Batch Size**: N/A
+- **Context Window**: 2048 tokens or 512 tokens
+- **Validation Split**: N/A
+- **Early Stopping**: N/A
+### Performance Monitoring
+- Training loss tracked per epoch with perplexity calculation
+- Full validation after each epoch
+- Step-level monitoring every 500 steps
+- Comprehensive metrics saved in `training_metrics.json`
+## Technical Implementation
+### Attention Mechanism
+- **Causal Masking**: Supports autoregressive generation
+- **Key Padding Mask**: Enables batched inference
+- **Scaled Dot-Product**: Head dimension normalization included
+### Memory Optimization
+- **Fused Operations**: Reduces memory fragmentation
+- **Mixed Precision**: 30-40% memory reduction
+- **Gradient Accumulation**: Simulates larger batch sizes
+- **Optional Quantization**: Further model compression
+### Training Stability
+- **Gradient Clipping**: Prevents exploding gradients
+- **Automatic Loss Scaling**: Mixed precision stability
+- **Early Stopping**: Prevents overfitting with patience mechanisms
+## System Requirements
+### Memory Requirements
+- **Training**: 16-24GB VRAM (precision dependent)
+- **Inference**: 4-6GB VRAM for standard generation
+- **Context**: Maximum 2048 tokens input length
+### Generation Parameters
+Default configuration:
+```json
+{
+  "do_sample": true,
+  "temperature": 1.0,
+  "top_p": 0.9,
+  "top_k": 50,
+  "max_new_tokens": 50,
+  "pad_token_id": 50256,
+  "eos_token_id": 50256
+}
+```
+## Limitations
+- **No HuggingFace `.generate()` support**: Use custom generation function
+- **Output Quality**: May produce repetitive or nonsensical text for some prompts
+- **Hardware Requirements**: GPU recommended for practical inference
+- **Context Window**: Limited to 2048 tokens (or 512 tokens)
+## Example Output
+```
+# WIP
+```
+## License & Acknowledgments
+- **License**: [NCSAM](https://github.com/FlameF0X/NCSAM)
+- **Framework**: Built using PyTorch
+- **Dataset**:
+-
+## Support Me
+You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!
 ### More meta-data
 - Release date: July 10 2025