---
license: apache-2.0
datasets:
- common-pile/wikimedia_filtered
language:
- en
library_name: transformers
tags:
- pre-train
- custom_code
- SnowflakeCore
pipeline_tag: text-generation
---

# SnowflakeCore-G1-Tiny2

A improve version of SnowflakeCore-G1-Tiny custom GPT-style transformer language model built from scratch using PyTorch, trained on the common-pile/wikimedia\_filtered dataset.

## Model Overview

SnowflakeCore-G1-Tiny2 is a GPT-style autoregressive transformer model with **\~400M parameters** designed for text generation tasks.

### Key Features

* **2048 token context window** for extended conversations
* **Mixed precision training** (BF16/FP16) for efficiency
* **Custom attention implementation** with fused operations
* **Early stopping mechanisms** for optimal training
* **Gradient accumulation** for effective large batch training

### Architecture Specifications

| Component       | Value                      |
| --------------- | -------------------------- |
| Model Type      | Autoregressive Transformer |
| Parameters      | \~400M                     |
| Layers          | 24                         |
| Hidden Size     | 1024                       |
| Attention Heads | 16                         |
| Head Dimension  | 64                         |
| FFN Dimension   | 4096                       |
| Context Length  | 2048 tokens                |
| Vocabulary Size | 50,257 (GPT-2 tokenizer)   |

## Model Benchmarks

The following benchmarks compare `SnowflakeCore-G1-Tiny2`, its predecessor, and GPT-2 on key performance and text quality metrics.

### Performance & Quality Metrics

| Model                      | Params | Size (MB) | Speed (tok/s) | Vocab Div. | Dist. Bigrams | Dist. Trigrams | Bigram Repet. | Trigram Repet. |
| -------------------------- | ------ | --------- | ------------- | ---------- | ------------- | -------------- | ------------- | -------------- |
| **SnowflakeCore-G1-Tiny2** | 355.9M | 1357.54   | 22.13         | **0.3440** | **0.7408**    | **0.8834**     | **0.2592**    | **0.1166**     |
| SnowflakeCore-G1-Tiny      | 355.9M | 1357.54   | 22.12         | 0.2780     | 0.6111        | 0.7421         | 0.3889        | 0.2579         |
| GPT-2 (small)              | 124.4M | 474.70    | **47.73**     | 0.2590     | 0.6408        | 0.7946         | 0.3592        | 0.2054         |

> **Notes:**
>
> * Vocabulary Diversity = unique tokens / total tokens
> * Distinct N-grams = unique n-grams / total n-grams
> * Lower repetition rates indicate better text novelty

### Memory Usage (CPU)

All models report `N/A` for CPU memory usage across all sequence lengths.

| Sequence Length | SnowflakeCore-G1-Tiny | SnowflakeCore-G1-Tiny2 | GPT-2 |
| --------------- | --------------------- | ---------------------- | ----- |
| 128             | N/A (CPU)             | N/A (CPU)              | N/A   |
| 512             | N/A (CPU)             | N/A (CPU)              | N/A   |
| 1024            | N/A (CPU)             | N/A (CPU)              | N/A   |
| 2048            | N/A (CPU)             | N/A (CPU)              | N/A   |

## Quick Start

### Installation

```bash
pip install torch transformers # if not already installed
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny2",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "FlameF0X/SnowflakeCore-G1-Tiny2",
    trust_remote_code=True,
    force_download=True,
    use_safetensors=True,
)

def custom_greedy_generate(prompt, max_length=50):
    model.eval()
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    generated = input_ids

    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(input_ids=generated)
            next_token_logits = outputs["logits"][:, -1, :]
            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
            generated = torch.cat((generated, next_token_id), dim=1)

            if next_token_id.item() == tokenizer.eos_token_id:
                break

    return tokenizer.decode(generated[0], skip_special_tokens=True)

# Generate text
prompt = "Once upon a time"
result = custom_greedy_generate(prompt)
print(result)
```

### Fine-Tuning

<code>... (same fine-tuning code as above) ...</code>

## Training Details

### Dataset

* **Source**: [common-pile/wikimedia\_filtered](https://huggingface.co/datasets/common-pile/wikimedia_filtered)

### Training Configuration

* **Framework**: PyTorch with mixed precision (BF16/FP16)
* **Optimizer**: AdamW (learning rate: 2e-4)
* **Batch Size**: 1 with gradient accumulation (32 steps)
* **Context Window**: 2048 tokens
* **Validation Split**: 10%
* **Early Stopping**: Implemented at epoch and step levels

### Performance Monitoring

* Training loss tracked per epoch with perplexity calculation
* Full validation after each epoch
* Step-level monitoring every 500 steps
* Comprehensive metrics saved in `training_metrics.json`

## Technical Implementation

### Attention Mechanism

* **Causal Masking**: Supports autoregressive generation
* **Key Padding Mask**: Enables batched inference
* **Scaled Dot-Product**: Head dimension normalization included

### Memory Optimization

* **Fused Operations**: Reduces memory fragmentation
* **Mixed Precision**: 30-40% memory reduction
* **Gradient Accumulation**: Simulates larger batch sizes
* **Optional Quantization**: Further model compression

### Training Stability

* **Gradient Clipping**: Prevents exploding gradients
* **Automatic Loss Scaling**: Mixed precision stability
* **Early Stopping**: Prevents overfitting with patience mechanisms

## System Requirements

### Memory Requirements

* **Training**: 16-24GB VRAM (precision dependent)
* **Inference**: 1-6GB VRAM for standard generation
* **Context**: Maximum 2048 tokens input length

### Generation Parameters

Default configuration:

```json
{
  "do_sample": true,
  "temperature": 1.0,
  "top_p": 0.9,
  "top_k": 50,
  "max_new_tokens": 50,
  "pad_token_id": 50256,
  "eos_token_id": 50256
}
```

## Model Files

The repository contains:

* `pytorch_model.bin` - PyTorch model weights
* `model.safetensors` - SafeTensors format weights
* `config.json` - Model configuration
* `generation_config.json` - Generation parameters
* `training_metrics.json` - Training statistics
* `tokenizer.json` - Tokenizer configuration
* `vocab.json` & `merges.txt` - Vocabulary files

## Limitations

* **No HuggingFace `.generate()` support**: Use custom generation function
* **Output Quality**: May produce repetitive or nonsensical text for some prompts
* **Hardware Requirements**: GPU recommended for practical inference
* **Context Window**: Limited to 2048 tokens
* **Dataset Dependency**: Performance tied to Mixture-of-Thoughts dataset quality

## Example Output

```
N/A
```

## Support Me

You can support me via [Ko-fi](https://ko-fi.com/flamef0x) or you can try my [Vast.ai](https://cloud.vast.ai/?ref_id=222345&creator_id=222345&name=Efficient%20Pretraining%20GPU%20Template) template!

### Small meta-data

* Release date: July 21, 2025.