FlameF0X's picture
Update README.md
34de5c6 verified
|
raw
history blame
3.71 kB
metadata
datasets:
  - tatsu-lab/alpaca
license: other
license_name: ncsam
license_link: https://github.com/FlameF0X/NCSAM
language:
  - en
base_model:
  - FlameF0X/SnowflakeCore-G1-Tiny
pipeline_tag: text-generation
library_name: transformers
tags:
  - pre_train
  - costume_code

SnowflakeCore-G1-Tiny-Instruct

A custom GPT-style transformer language model built from scratch using PyTorch.

Model Overview

SnowflakeCore-G1-Tiny and SnowflakeCore-G1-Tiny-Instruct are a GPT-style autoregressive transformer model with ~407M parameters (407,334,912) designed for text generation tasks.

Key Features

  • 2048 token context window for extended conversations
  • Mixed precision training (BF16/FP16) for efficiency
  • Custom attention implementation with fused operations
  • Early stopping mechanisms N/A
  • Gradient accumulation for effective large batch training

Architecture Specifications

Component Value
Model Type Autoregressive Transformer
Parameters ~407M
Layers 24
Hidden Size 1024
Attention Heads 16
Head Dimension 64
FFN Dimension 4096
Context Length 2048 tokens
Vocabulary Size 50,257 (GPT-2 tokenizer)

Quick Start

Installation

pip install torch transformers # if not already installed

Basic Usage

# N/A

Training Details

Dataset

  • Source:

Training Configuration

  • Framework: PyTorch with mixed precision (BF16/FP16)
  • Optimizer: AdamW (learning rate: 2e-4)
  • Batch Size: N/A
  • Context Window: 2048 tokens or 512 tokens
  • Validation Split: N/A
  • Early Stopping: N/A

Performance Monitoring

  • Training loss tracked per epoch with perplexity calculation
  • Full validation after each epoch
  • Step-level monitoring every 500 steps
  • Comprehensive metrics saved in training_metrics.json

Technical Implementation

Attention Mechanism

  • Causal Masking: Supports autoregressive generation
  • Key Padding Mask: Enables batched inference
  • Scaled Dot-Product: Head dimension normalization included

Memory Optimization

  • Fused Operations: Reduces memory fragmentation
  • Mixed Precision: 30-40% memory reduction
  • Gradient Accumulation: Simulates larger batch sizes
  • Optional Quantization: Further model compression

Training Stability

  • Gradient Clipping: Prevents exploding gradients
  • Automatic Loss Scaling: Mixed precision stability
  • Early Stopping: Prevents overfitting with patience mechanisms

System Requirements

Memory Requirements

  • Training: 16-24GB VRAM (precision dependent)
  • Inference: 4-6GB VRAM for standard generation
  • Context: Maximum 2048 tokens input length

Generation Parameters

Default configuration:

{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}

Limitations

  • No HuggingFace .generate() support: Use custom generation function
  • Output Quality: May produce repetitive or nonsensical text for some prompts
  • Hardware Requirements: GPU recommended for practical inference
  • Context Window: Limited to 2048 tokens (or 512 tokens)

Example Output

# WIP

License & Acknowledgments

  • License: NCSAM
  • Framework: Built using PyTorch
  • Dataset:

Support Me

You can support me via Ko-fi or you can try my Vast.ai template!

More meta-data

  • Release date: July 10 2025