A newer version of this model is available: FlameF0X/SnowflakeCore-G1-Tiny2

SnowflakeCore-G1-Tiny

A custom GPT-style transformer language model built from scratch using PyTorch, trained on the Mixture-of-Thoughts dataset for enhanced reasoning capabilities.

Model Overview

SnowflakeCore-G1-Tiny is a GPT-style autoregressive transformer model with ~400M parameters designed for text generation tasks.

Key Features

  • 2048 token context window for extended conversations
  • Mixed precision training (BF16/FP16) for efficiency
  • Custom attention implementation with fused operations
  • Early stopping mechanisms for optimal training
  • Gradient accumulation for effective large batch training

Architecture Specifications

Component Value
Model Type Autoregressive Transformer
Parameters ~400M
Layers 24
Hidden Size 1024
Attention Heads 16
Head Dimension 64
FFN Dimension 4096
Context Length 2048 tokens
Vocabulary Size 50,257 (GPT-2 tokenizer)

Quick Start

Installation

pip install torch transformers # if not already installed

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)

def custom_greedy_generate(prompt, max_length=50):
    model.eval()
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    generated = input_ids
    
    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids=generated)
            next_token_logits = outputs["logits"][:, -1, :]
            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
            generated = torch.cat((generated, next_token_id), dim=1)
            
            if next_token_id.item() == tokenizer.eos_token_id:
                break
                
    return tokenizer.decode(generated[0], skip_special_tokens=True)

# Generate text
prompt = "Once upon a time"
result = custom_greedy_generate(prompt)
print(result)

Fine-Tuning

import os
import argparse
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    Trainer,
    TrainingArguments,
)
from datasets import load_dataset
import torch

# === Disable W&B logging ===
os.environ["WANDB_DISABLED"] = "true"

# === Config ===
config = {
    "model_name": "FlameF0X/SnowflakeCore-G1-Tiny",
    "output_dir": "./snowflake-chatbot",
    "context_window": 512,
    "per_device_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "max_steps": 500,
    "dataloader_workers": 4,
    "dataset_name": "tatsu-lab/alpaca",
    "dataset_split": "train[:10000]",
}

# === Derived ===
config["effective_batch_size"] = (
    config["per_device_batch_size"] * config["gradient_accumulation_steps"]
)

print(f"Effective batch size: {config['effective_batch_size']}")
print(f"Context window: {config['context_window']}")


# === 1. Load tokenizer and model ===
def load_model_and_tokenizer(config):
    print(f"Loading model and tokenizer from {config['model_name']}...")
    tokenizer = AutoTokenizer.from_pretrained(
        config["model_name"],
        trust_remote_code=True,
        force_download=True,
        use_safetensors=True,
        model_max_length=config["context_window"],
    )
    model = AutoModelForCausalLM.from_pretrained(
        config["model_name"],
        trust_remote_code=True,
        force_download=True,
        use_safetensors=True,
    )

    if hasattr(torch, "compile"):
        try:
            print("Compiling model with torch.compile...")
            model = torch.compile(model)
        except Exception as e:
            print(f"Compilation failed: {e}")
    return tokenizer, model


# === 2. Load dataset ===
def load_custom_dataset(name, split):
    print(f"Loading dataset: {name} ({split})...")
    return load_dataset(name, split=split)


# === 3. Format dataset ===
def format_example(example):
    """Update this function to work with different datasets."""
    return {
        "text": f"### Instruction:\n{example['instruction']}\n### Input:\n{example['input']}\n### Response:\n{example['output']}"
    }


# === 4. Tokenize ===
def tokenize_example(example, tokenizer, max_length):
    tokens = tokenizer(
        example["text"],
        truncation=True,
        padding="max_length",
        max_length=max_length,
    )
    tokens["labels"] = tokens["input_ids"].copy()
    return tokens


# === 5. Train ===
def train_model(model, tokenizer, tokenized_dataset, config):
    print("Preparing training arguments...")
    training_args = TrainingArguments(
        output_dir=config["output_dir"],
        per_device_train_batch_size=config["per_device_batch_size"],
        gradient_accumulation_steps=config["gradient_accumulation_steps"],
        max_steps=config["max_steps"],
        logging_dir="./logs",
        logging_steps=20,
        save_strategy="no",
        fp16=torch.cuda.is_available() and not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_available() and torch.cuda.is_bf16_supported(),
        overwrite_output_dir=True,
        report_to=[],
        dataloader_num_workers=config["dataloader_workers"],
        optim="adamw_torch_fused" if torch.cuda.is_available() and hasattr(torch, 'compile') else "adamw_torch",
        remove_unused_columns=False,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
    )

    print("Starting training...")
    trainer.train()
    print("Training completed.")


# === 6. Save ===
def save_model(model, tokenizer, output_dir):
    print(f"Saving model to {output_dir}...")
    model.save_pretrained(output_dir, safe_serialization=False)
    tokenizer.save_pretrained(output_dir)
    print("Model saved.")


# === Main ===
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset", type=str, default=config["dataset_name"])
    parser.add_argument("--split", type=str, default=config["dataset_split"])
    args = parser.parse_args()

    tokenizer, model = load_model_and_tokenizer(config)
    dataset = load_custom_dataset(args.dataset, args.split)

    print("Formatting dataset...")
    dataset = dataset.map(format_example, num_proc=config["dataloader_workers"], load_from_cache_file=False)

    print("Tokenizing dataset...")
    tokenized = dataset.map(
        lambda x: tokenize_example(x, tokenizer, config["context_window"]),
        batched=True,
        num_proc=config["dataloader_workers"],
        load_from_cache_file=False,
    )
    tokenized.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

    train_model(model, tokenizer, tokenized, config)
    save_model(model, tokenizer, config["output_dir"])


if __name__ == "__main__":
    main()

Training Details

Dataset

Training Configuration

  • Framework: PyTorch with mixed precision (BF16/FP16)
  • Optimizer: AdamW (learning rate: 2e-4)
  • Batch Size: 1 with gradient accumulation (32 steps)
  • Context Window: 2048 tokens
  • Validation Split: 10%
  • Early Stopping: Implemented at epoch and step levels

Performance Monitoring

  • Training loss tracked per epoch with perplexity calculation
  • Full validation after each epoch
  • Step-level monitoring every 500 steps
  • Comprehensive metrics saved in training_metrics.json

Technical Implementation

Attention Mechanism

  • Causal Masking: Supports autoregressive generation
  • Key Padding Mask: Enables batched inference
  • Scaled Dot-Product: Head dimension normalization included

Memory Optimization

  • Fused Operations: Reduces memory fragmentation
  • Mixed Precision: 30-40% memory reduction
  • Gradient Accumulation: Simulates larger batch sizes
  • Optional Quantization: Further model compression

Training Stability

  • Gradient Clipping: Prevents exploding gradients
  • Automatic Loss Scaling: Mixed precision stability
  • Early Stopping: Prevents overfitting with patience mechanisms

System Requirements

Memory Requirements

  • Training: 16-24GB VRAM (precision dependent)
  • Inference: 4-6GB VRAM for standard generation
  • Context: Maximum 2048 tokens input length

Generation Parameters

Default configuration:

{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}

Model Files

The repository contains:

  • pytorch_model.bin - PyTorch model weights
  • model.safetensors - SafeTensors format weights
  • config.json - Model configuration
  • generation_config.json - Generation parameters
  • training_metrics.json - Training statistics
  • tokenizer.json - Tokenizer configuration
  • vocab.json & merges.txt - Vocabulary files

Limitations

  • No HuggingFace .generate() support: Use custom generation function
  • Output Quality: May produce repetitive or nonsensical text for some prompts
  • Hardware Requirements: GPU recommended for practical inference
  • Context Window: Limited to 2048 tokens
  • Dataset Dependency: Performance tied to Mixture-of-Thoughts dataset quality

Example Output

Input: Hello, I am Alex and

Output: Hello, I am Alex andbourg Chip Chip Chip Chip Chip Chip Chip ChipCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCosCos

Note: The repetitive output shown is typical for small or early-stage models and can be improved with further training or fine-tuning.

Support Me

You can support me via Ko-fi or you can try my Vast.ai template!

Small meta-data

  • Release date: June 29, 2025.
Downloads last month
151
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FlameF0X/SnowflakeCore-G1-Tiny

Finetunes
2 models

Dataset used to train FlameF0X/SnowflakeCore-G1-Tiny

Collection including FlameF0X/SnowflakeCore-G1-Tiny

Evaluation results