Puchify T2 Pro-8B

Puchify T2 Pro-8B is an 8.2-billion-parameter large language model, representing a new pinnacle of innovation in the Puchify model family. Leveraging the proven Qwen3-8B architecture, T2 Pro-8B introduces refined engineering and advanced safety mechanisms for complex, context-rich, and secure text generation. Through deep integration of the next-generation S.A.F.E (Safety Assurance For Expression) framework, T2 Pro-8B delivers exceptional performance across dialogue, creative writing, summarization, coding, and educational tasks.

This model is carefully optimized for maximum accessibility and responsible deployment in both personal and commercial environments, offering robust reliability and expressive language capabilities for a wide spectrum of applications. Its architecture is built for extended context comprehension, efficient and nuanced reasoning, and comprehensive safety controls, making it an ideal choice for research, educational, and commercial integration—provided its identity and attribution are maintained in all deployments and derivative works.

S.A.F.E Framework

Commitment to Safety

Puchify T2 Pro-8B is governed by the S.A.F.E (Safety Assurance For Expression) framework, anchoring its development in responsible and transparent AI principles. S.A.F.E works to minimize harmful, biased, or repetitive content while promoting clarity, engagement, and helpfulness. This framework is deeply embedded in both model training and deployment, reflecting a strong commitment to ethical AI advancement.

For optimal safety and user experience, deploy Puchify T2 Pro-8B using the Hugging Face pipeline API:

from transformers import pipeline

pipe = pipeline("text-generation", model="Puchify/PuchifyT2-Pro-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
response = pipe(messages)
print(response)

Key Features

Parameters: 8.2B
Layers: 36
Attention Heads: 32 for Q, 8 for KV (Grouped-Query-Attention)
Hidden Size: 4096
Feed-Forward Network Size: 16,384
Context Length: 32,768 tokens natively; up to 131,072 tokens with YaRN
Tokenizer: SentencePiece-inspired BPE, 151,936-token vocabulary
Normalization: RMSNorm (ε = 1e-6)
Reasoning Modes: Effortless switching between “thinking” (for intricate reasoning) and “non-thinking” (for rapid dialogue)
Multilingual: Supports 100+ languages and dialects
Agent Integration: Built-in tool-calling and agentic workflows
Safety: Enhanced S.A.F.E. framework for minimized bias and harm

Installation

Puchify T2 Pro-8B can be accessed from the Hugging Face Model Hub or from local files. Ensure Python 3.8+ is installed along with the following libraries:

pip install torch>=2.1 transformers>=4.54

For 4-bit quantized inference, install either bitsandbytes or AutoGPTQ.

Usage

To load the model manually from the Hugging Face Hub or a local path:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_path = "Puchify/PuchifyT2-Pro-8B"  # or your local path

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,  # Use torch.float16 if your GPU does not support BF16
    device_map="auto",
    trust_remote_code=True,
)

gen_cfg = GenerationConfig.from_pretrained(model_path)

prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, **gen_cfg.to_dict())
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

A GPU with at least 16 GB VRAM is recommended for BF16/FP16 inference. With 4-bit quantization, inference is possible on GPUs with 8 GB VRAM.

Model Architecture

T2 Pro-8B’s architecture is rooted in Qwen3-8B, featuring 36 transformer layers, each with 4096 hidden units and 32 attention heads for queries (Q), with 8 heads for key/value (KV) in Grouped-Query-Attention. The feed-forward network is sized at 16,384. The model employs rotary positional encoding (θ = 1,000,000) and RMSNorm (ε = 1e-6) for robust normalization. Its tokenizer is a SentencePiece-like BPE with a 151,936-token vocabulary, including special start and end tokens.

Parameter count: 8.2 billion (6.95B non-embedding)
Context window: 32,768 tokens natively, up to 131,072 tokens with YaRN scaling
Grouped-Query-Attention: 32 for Q, 8 for KV
Tokenizer: SentencePiece BPE, 151,936 vocabulary

Repository Contents

The repository includes:

Sharded model weights in safetensors format
Shard index mapping
Architecture and generation configuration files
Tokenizer assets
System prompt metadata
Files for added and special tokens

Training Data

The base model was pre-trained on diverse, large-scale web corpora by Qwen. T2 Pro-8B was then further aligned and instruction-tuned using carefully curated examples focused on safety, advanced reasoning, and policy alignment. No private or proprietary user data was used during fine-tuning.

Intended Use and Limitations

Puchify T2 Pro-8B is engineered for:

Conversational agents
Creative writing
Summarization
Code generation and explanation
Educational support

It is not intended for generating disallowed content such as hate speech, extremism, explicit abuse, medical or legal advice without expert oversight, real-time autonomous decision-making, or disinformation.

While the S.A.F.E framework reduces risks, some biases may persist. Human oversight is required for all deployments.

Licensing and Responsible Use

Puchify T2 Pro-8B is released under the OpenRAIL v1 license with additional terms to encourage broad adoption and responsible innovation. The model and its derivatives may be used for both non-commercial and commercial purposes, including integration into products and services, as long as the name “Puchify T2 Pro-8B” remains prominent and unchanged in all deployments, redistributions, and derivative works. This ensures attribution, transparency, and consistency across the ecosystem.

You may create, quantize, compress, or convert the model and share optimized versions freely or as part of commercial offerings, provided that the original model identity is preserved and clearly attributed. Removal or misrepresentation of the model’s origin is not permitted.

Redistributions must include this documentation and full attribution to Puchify Inc. For further details, see LICENSE.txt.

Citation

If you use Puchify T2 Pro-8B in your work, please cite:

@misc{puchify2025t2-pro-8b,
  title   = {Puchify T2 Pro-8B: Advanced Safe & Reasoning-centric Hybrid Model},
  author  = {Puchify Inc.},
  year    = {2025},
  url     = {https://puchify.ai/models/t2-pro-8b}
}

Puchify
/

PuchifyT2-Pro