---
datasets:
- tatsu-lab/alpaca
license: other
license_name: ncsam
license_link: https://github.com/FlameF0X/NCSAM
language:
- en
base_model:
- FlameF0X/SnowflakeCore-G1-Tiny
pipeline_tag: text-generation
library_name: transformers
tags:
- pre_train
- costume_code
---

# SnowflakeCore-G1-Tiny-Instruct

A custom GPT-style transformer language model built from scratch using PyTorch.

## Model Overview

SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~407M parameters** (407,334,912) designed for text generation tasks.

### Key Features
- **2048 token context window** for extended conversations
- **Mixed precision training** (BF16/FP16) for efficiency
- **Custom attention implementation** with fused operations
- **Early stopping mechanisms** N/A
- **Gradient accumulation** for effective large batch training

### Architecture Specifications

| Component | Value |
|-----------|-------|
| Model Type | Autoregressive Transformer |
| Parameters | ~407M |
| Layers | 24 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Head Dimension | 64 |
| FFN Dimension | 4096 |
| Context Length | 2048 tokens |
| Vocabulary Size | 50,257 (GPT-2 tokenizer) |

## Quick Start

### Installation

```bash
pip install torch transformers # if not already installed
```

### Basic Usage

```python
# N/A
```

## Training Details

### Dataset
- **Source**: 

### Training Configuration
- **Framework**: PyTorch with mixed precision (BF16/FP16)
- **Optimizer**: AdamW (learning rate: 2e-4)
- **Batch Size**: N/A
- **Context Window**: 2048 tokens or 512 tokens
- **Validation Split**: N/A
- **Early Stopping**: N/A
### Performance Monitoring
- Training loss tracked per epoch with perplexity calculation
- Full validation after each epoch
- Step-level monitoring every 500 steps
- Comprehensive metrics saved in `training_metrics.json`

## Technical Implementation

### Attention Mechanism
- **Causal Masking**: Supports autoregressive generation
- **Key Padding Mask**: Enables batched inference
- **Scaled Dot-Product**: Head dimension normalization included

### Memory Optimization
- **Fused Operations**: Reduces memory fragmentation
- **Mixed Precision**: 30-40% memory reduction
- **Gradient Accumulation**: Simulates larger batch sizes
- **Optional Quantization**: Further model compression

### Training Stability
- **Gradient Clipping**: Prevents exploding gradients
- **Automatic Loss Scaling**: Mixed precision stability
- **Early Stopping**: Prevents overfitting with patience mechanisms

## System Requirements

### Memory Requirements
- **Training**: 16-24GB VRAM (precision dependent)
- **Inference**: 4-6GB VRAM for standard generation
- **Context**: Maximum 2048 tokens input length

### Generation Parameters

Default configuration:
```json
{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}
```


## Limitations

- **No HuggingFace `.generate()` support**: Use custom generation function
- **Output Quality**: May produce repetitive or nonsensical text for some prompts
- **Hardware Requirements**: GPU recommended for practical inference
- **Context Window**: Limited to 2048 tokens (or 512 tokens)

## Example Output

```
# WIP
```

## License & Acknowledgments

- **License**: [NCSAM](https://github.com/FlameF0X/NCSAM)
- **Framework**: Built using PyTorch
- **Dataset**:
- 
## Support Me

You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!


### More meta-data
- Release date: July 10 2025