Puchify T2-8B

Puchify T2-8B is an 8.2-billion-parameter large language model, representing a significant advancement in the Puchify model family. Built on the robust foundation of the Qwen3-8B architecture, T2-8B is designed for enhanced capabilities in complex, context-rich, and secure text generation. It integrates the S.A.F.E (Safety Assurance For Expression) framework, ensuring high performance across diverse applications including dialogue, creative writing, summarization, coding, and educational tasks. This model is optimized for broad accessibility and responsible deployment in both personal and commercial environments, offering superior reliability and expressive language capabilities for a wide range of scenarios. Its architecture supports extended context handling, efficient reasoning, and effective safety controls, making it suitable for research, educational, and commercial integration, provided its identity and attribution are preserved in all deployments and derivative works.

S.A.F.E Framework

Commitment to Safety

All Puchify models, including T2-8B, are governed by the S.A.F.E (Safety Assurance For Expression) framework as a foundational principle. S.A.F.E minimizes harmful, biased, or repetitive content while promoting clarity, engagement, and helpfulness. This framework is central to both model training and deployment, reflecting a commitment to responsible AI development.

For optimal safety and user experience, deploy Puchify T2-8B using the Hugging Face pipeline API:

from transformers import pipeline

pipe = pipeline("text-generation", model="Puchify/PuchifyT2-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
response = pipe(messages)
print(response)

Key Features

Parameters: 8.2B
Layers: 36
Attention Heads: 32 for Q, 8 for KV (Grouped-Query-Attention)
Hidden Size: 4096
Feed-Forward Network Size: 16,384
Context Length: 32,768 tokens natively; up to 131,072 tokens with YaRN
Tokenizer: SentencePiece-inspired BPE, 151,936-token vocabulary
Normalization: RMSNorm (ε = 1e-6)
Reasoning Modes: Seamless switching between “thinking” (for complex reasoning) and “non-thinking” (for efficient dialogue)
Multilingual: Supports 100+ languages and dialects
Agent Integration: Tool-calling and agentic workflows
Safety: S.A.F.E. framework for minimized bias and harm

Installation

Puchify T2-8B can be accessed from the Hugging Face Model Hub or from local files. Ensure Python 3.8+ is installed along with the following libraries:

pip install torch>=2.1 transformers>=4.54

For 4-bit quantized inference, install either bitsandbytes or AutoGPTQ.

Usage

To load the model manually from the Hugging Face Hub or a local path:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_path = "Puchify/PuchifyT2-8B"  # or your local path

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,  # Use torch.float16 if your GPU does not support BF16
    device_map="auto",
    trust_remote_code=True,
)

gen_cfg = GenerationConfig.from_pretrained(model_path)

prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, **gen_cfg.to_dict())
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

A GPU with at least 16 GB VRAM is recommended for BF16/FP16 inference. With 4-bit quantization, inference is possible on GPUs with 8 GB VRAM.

Model Architecture

Puchify T2-8B is based on Qwen3-8B and uses 36 transformer layers, each with 4096 hidden units and 32 attention heads for queries (Q), with 8 heads for key/value (KV) in Grouped-Query-Attention. The feed-forward network size is 16,384. The model features rotary positional encoding (θ = 1,000,000) and RMSNorm (ε = 1e-6) for normalization. The tokenizer is a SentencePiece-like BPE with a 151,936-token vocabulary, including special start and end tokens.

Parameter count: 8.2 billion (6.95B non-embedding)
Context window: 32,768 tokens natively, up to 131,072 tokens with YaRN scaling
Grouped-Query-Attention: 32 for Q, 8 for KV
Tokenizer: SentencePiece BPE, 151,936 vocabulary

Repository Contents

The repository includes sharded model weights in safetensors format, a shard index mapping, architecture and generation configuration files, tokenizer assets, system prompt metadata, and files for added and special tokens.

Training Data

The base model was pre-trained on diverse, large-scale web corpora by Qwen. Puchify T2-8B was then further aligned and instruction-tuned using curated examples focused on safety, reasoning, and policy alignment. No private or proprietary user data was used during fine-tuning.

Intended Use and Limitations

Puchify T2-8B is designed for conversational agents, creative writing, summarization, code generation and explanation, and educational support. It is not intended for generating disallowed content such as hate speech, extremism, explicit abuse, medical or legal advice without expert oversight, real-time autonomous decision-making, or disinformation.

While the S.A.F.E framework reduces risks, some biases may persist. Human oversight is required for all deployments.

Licensing and Responsible Use

Puchify T2-8B is released under the OpenRAIL v1 license with additional terms to encourage broad adoption and responsible innovation. The model and its derivatives may be used for both non-commercial and commercial purposes, including integration into products and services, as long as the name “Puchify T2-8B” remains prominent and unchanged in all deployments, redistributions, and derivative works. This ensures attribution, transparency, and consistency across the ecosystem.

You may create, quantize, compress, or convert the model and share optimized versions freely or as part of commercial offerings, provided that the original model identity is preserved and clearly attributed. Removal or misrepresentation of the model’s origin is not permitted.

Redistributions must include this documentation and full attribution to Puchify Inc. For further details, see LICENSE.txt.

Citation

If you use Puchify T2-8B in your work, please cite:

@misc{puchify2025t2-8b,
  title   = {Puchify T2-8B: Advanced Safe & Reasoning-centric Hybrid Model},
  author  = {Puchify Inc.},
  year    = {2025},
  url     = {https://puchify.ai/models/t2-8b}
}

Puchify
/

PuchifyT2-8B