FlameF0X's picture
Update README.md
d11d12d verified
---
datasets:
- tatsu-lab/alpaca
license: apache-2.0
language:
- en
base_model:
- FlameF0X/SnowflakeCore-G1-Tiny
pipeline_tag: text-generation
library_name: transformers
tags:
- pre-train
- costume_code
---
# SnowflakeCore-G1-Tiny-Instruct
A custom GPT-style transformer language model built from scratch using PyTorch.
## Model Overview
SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with **~400M parameters** designed for text generation tasks.
### Key Features
- **2048 token context window** for extended conversations
- **Mixed precision training** (BF16/FP16) for efficiency
- **Custom attention implementation** with fused operations
- **Early stopping mechanisms** N/A
- **Gradient accumulation** for effective large batch training
### Architecture Specifications
| Component | Value |
|-----------|-------|
| Model Type | Autoregressive Transformer |
| Parameters | ~400M |
| Layers | 24 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Head Dimension | 64 |
| FFN Dimension | 4096 |
| Context Length | 2048 tokens |
| Vocabulary Size | 50,257 (GPT-2 tokenizer) |
## Quick Start
### Installation
```bash
pip install torch transformers # if not already installed
```
### Basic Usage
```python
# N/A
```
## Training Details
### Dataset
- **Source**:
### Training Configuration
- **Framework**: PyTorch with mixed precision (BF16/FP16)
- **Optimizer**: AdamW (learning rate: 2e-4)
- **Batch Size**: N/A
- **Context Window**: 2048 tokens or 512 tokens
- **Validation Split**: N/A
- **Early Stopping**: N/A
### Performance Monitoring
- Training loss tracked per epoch with perplexity calculation
- Full validation after each epoch
- Step-level monitoring every 500 steps
- Comprehensive metrics saved in `training_metrics.json`
## Technical Implementation
### Attention Mechanism
- **Causal Masking**: Supports autoregressive generation
- **Key Padding Mask**: Enables batched inference
- **Scaled Dot-Product**: Head dimension normalization included
### Memory Optimization
- **Fused Operations**: Reduces memory fragmentation
- **Mixed Precision**: 30-40% memory reduction
- **Gradient Accumulation**: Simulates larger batch sizes
- **Optional Quantization**: Further model compression
### Training Stability
- **Gradient Clipping**: Prevents exploding gradients
- **Automatic Loss Scaling**: Mixed precision stability
- **Early Stopping**: Prevents overfitting with patience mechanisms
## System Requirements
### Memory Requirements
- **Training**: 16-24GB VRAM (precision dependent)
- **Inference**: 4-6GB VRAM for standard generation
- **Context**: Maximum 2048 tokens input length
### Generation Parameters
Default configuration:
```json
{
"do_sample": true,
"temperature": 1.0,
"top_p": 0.9,
"top_k": 50,
"max_new_tokens": 50,
"pad_token_id": 50256,
"eos_token_id": 50256
}
```
## Limitations
- **No HuggingFace `.generate()` support**: Use custom generation function
- **Output Quality**: May produce repetitive or nonsensical text for some prompts
- **Hardware Requirements**: GPU recommended for practical inference
- **Context Window**: Limited to 2048 tokens (or 512 tokens)
## Example Output
```
# WIP
```
## Support Me
You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!
### More meta-data
- Release date: July 10, 2025