FlameF0X
/

SnowflakeCore-G1-Tiny-Instruct

Text Generation

Model card Files Files and versions

SnowflakeCore-G1-Tiny-Instruct / README.md

FlameF0X's picture

Update README.md

d11d12d verified 26 days ago

|

history blame contribute delete

3.49 kB

	---
	datasets:
	- tatsu-lab/alpaca
	license: apache-2.0
	language:
	- en
	base_model:
	- FlameF0X/SnowflakeCore-G1-Tiny
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- pre-train
	- costume_code
	---

	# SnowflakeCore-G1-Tiny-Instruct

	A custom GPT-style transformer language model built from scratch using PyTorch.

	## Model Overview

	SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with ~400M parameters designed for text generation tasks.

	### Key Features
	- 2048 token context window for extended conversations
	- Mixed precision training (BF16/FP16) for efficiency
	- Custom attention implementation with fused operations
	- Early stopping mechanisms N/A
	- Gradient accumulation for effective large batch training

	### Architecture Specifications

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Model Type \| Autoregressive Transformer \|
	\| Parameters \| ~400M \|
	\| Layers \| 24 \|
	\| Hidden Size \| 1024 \|
	\| Attention Heads \| 16 \|
	\| Head Dimension \| 64 \|
	\| FFN Dimension \| 4096 \|
	\| Context Length \| 2048 tokens \|
	\| Vocabulary Size \| 50,257 (GPT-2 tokenizer) \|

	## Quick Start

	### Installation

	```bash
	pip install torch transformers # if not already installed
	```

	### Basic Usage

	```python
	# N/A
	```

	## Training Details

	### Dataset
	- Source:

	### Training Configuration
	- Framework: PyTorch with mixed precision (BF16/FP16)
	- Optimizer: AdamW (learning rate: 2e-4)
	- Batch Size: N/A
	- Context Window: 2048 tokens or 512 tokens
	- Validation Split: N/A
	- Early Stopping: N/A
	### Performance Monitoring
	- Training loss tracked per epoch with perplexity calculation
	- Full validation after each epoch
	- Step-level monitoring every 500 steps
	- Comprehensive metrics saved in `training_metrics.json`

	## Technical Implementation

	### Attention Mechanism
	- Causal Masking: Supports autoregressive generation
	- Key Padding Mask: Enables batched inference
	- Scaled Dot-Product: Head dimension normalization included

	### Memory Optimization
	- Fused Operations: Reduces memory fragmentation
	- Mixed Precision: 30-40% memory reduction
	- Gradient Accumulation: Simulates larger batch sizes
	- Optional Quantization: Further model compression

	### Training Stability
	- Gradient Clipping: Prevents exploding gradients
	- Automatic Loss Scaling: Mixed precision stability
	- Early Stopping: Prevents overfitting with patience mechanisms

	## System Requirements

	### Memory Requirements
	- Training: 16-24GB VRAM (precision dependent)
	- Inference: 4-6GB VRAM for standard generation
	- Context: Maximum 2048 tokens input length

	### Generation Parameters

	Default configuration:
	```json
	{
	"do_sample": true,
	"temperature": 1.0,
	"top_p": 0.9,
	"top_k": 50,
	"max_new_tokens": 50,
	"pad_token_id": 50256,
	"eos_token_id": 50256
	}
	```


	## Limitations

	- No HuggingFace `.generate()` support: Use custom generation function
	- Output Quality: May produce repetitive or nonsensical text for some prompts
	- Hardware Requirements: GPU recommended for practical inference
	- Context Window: Limited to 2048 tokens (or 512 tokens)

	## Example Output

	```
	# WIP
	```

	## Support Me

	You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!


	### More meta-data
	- Release date: July 10, 2025