Puchify T2 Pro-8B
Puchify T2 Pro-8B is an 8.2-billion-parameter large language model, representing a new pinnacle of innovation in the Puchify model family. Leveraging the proven Qwen3-8B architecture, T2 Pro-8B introduces refined engineering and advanced safety mechanisms for complex, context-rich, and secure text generation. Through deep integration of the next-generation S.A.F.E (Safety Assurance For Expression) framework, T2 Pro-8B delivers exceptional performance across dialogue, creative writing, summarization, coding, and educational tasks.
This model is carefully optimized for maximum accessibility and responsible deployment in both personal and commercial environments, offering robust reliability and expressive language capabilities for a wide spectrum of applications. Its architecture is built for extended context comprehension, efficient and nuanced reasoning, and comprehensive safety controls, making it an ideal choice for research, educational, and commercial integration—provided its identity and attribution are maintained in all deployments and derivative works.
S.A.F.E Framework
Commitment to Safety
Puchify T2 Pro-8B is governed by the S.A.F.E (Safety Assurance For Expression) framework, anchoring its development in responsible and transparent AI principles. S.A.F.E works to minimize harmful, biased, or repetitive content while promoting clarity, engagement, and helpfulness. This framework is deeply embedded in both model training and deployment, reflecting a strong commitment to ethical AI advancement.
For optimal safety and user experience, deploy Puchify T2 Pro-8B using the Hugging Face pipeline API:
from transformers import pipeline
pipe = pipeline("text-generation", model="Puchify/PuchifyT2-Pro-8B")
messages = [
{"role": "user", "content": "Who are you?"},
]
response = pipe(messages)
print(response)
Key Features
- Parameters: 8.2B
- Layers: 36
- Attention Heads: 32 for Q, 8 for KV (Grouped-Query-Attention)
- Hidden Size: 4096
- Feed-Forward Network Size: 16,384
- Context Length: 32,768 tokens natively; up to 131,072 tokens with YaRN
- Tokenizer: SentencePiece-inspired BPE, 151,936-token vocabulary
- Normalization: RMSNorm (ε = 1e-6)
- Reasoning Modes: Effortless switching between “thinking” (for intricate reasoning) and “non-thinking” (for rapid dialogue)
- Multilingual: Supports 100+ languages and dialects
- Agent Integration: Built-in tool-calling and agentic workflows
- Safety: Enhanced S.A.F.E. framework for minimized bias and harm
Installation
Puchify T2 Pro-8B can be accessed from the Hugging Face Model Hub or from local files. Ensure Python 3.8+ is installed along with the following libraries:
pip install torch>=2.1 transformers>=4.54
For 4-bit quantized inference, install either bitsandbytes
or AutoGPTQ
.
Usage
To load the model manually from the Hugging Face Hub or a local path:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_path = "Puchify/PuchifyT2-Pro-8B" # or your local path
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16, # Use torch.float16 if your GPU does not support BF16
device_map="auto",
trust_remote_code=True,
)
gen_cfg = GenerationConfig.from_pretrained(model_path)
prompt = "Explain the theory of relativity in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, **gen_cfg.to_dict())
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
A GPU with at least 16 GB VRAM is recommended for BF16/FP16 inference. With 4-bit quantization, inference is possible on GPUs with 8 GB VRAM.
Model Architecture
T2 Pro-8B’s architecture is rooted in Qwen3-8B, featuring 36 transformer layers, each with 4096 hidden units and 32 attention heads for queries (Q), with 8 heads for key/value (KV) in Grouped-Query-Attention. The feed-forward network is sized at 16,384. The model employs rotary positional encoding (θ = 1,000,000) and RMSNorm (ε = 1e-6) for robust normalization. Its tokenizer is a SentencePiece-like BPE with a 151,936-token vocabulary, including special start and end tokens.
- Parameter count: 8.2 billion (6.95B non-embedding)
- Context window: 32,768 tokens natively, up to 131,072 tokens with YaRN scaling
- Grouped-Query-Attention: 32 for Q, 8 for KV
- Tokenizer: SentencePiece BPE, 151,936 vocabulary
Repository Contents
The repository includes:
- Sharded model weights in safetensors format
- Shard index mapping
- Architecture and generation configuration files
- Tokenizer assets
- System prompt metadata
- Files for added and special tokens
Training Data
The base model was pre-trained on diverse, large-scale web corpora by Qwen. T2 Pro-8B was then further aligned and instruction-tuned using carefully curated examples focused on safety, advanced reasoning, and policy alignment. No private or proprietary user data was used during fine-tuning.
Intended Use and Limitations
Puchify T2 Pro-8B is engineered for:
- Conversational agents
- Creative writing
- Summarization
- Code generation and explanation
- Educational support
It is not intended for generating disallowed content such as hate speech, extremism, explicit abuse, medical or legal advice without expert oversight, real-time autonomous decision-making, or disinformation.
While the S.A.F.E framework reduces risks, some biases may persist. Human oversight is required for all deployments.
Licensing and Responsible Use
Puchify T2 Pro-8B is released under the OpenRAIL v1 license with additional terms to encourage broad adoption and responsible innovation. The model and its derivatives may be used for both non-commercial and commercial purposes, including integration into products and services, as long as the name “Puchify T2 Pro-8B” remains prominent and unchanged in all deployments, redistributions, and derivative works. This ensures attribution, transparency, and consistency across the ecosystem.
You may create, quantize, compress, or convert the model and share optimized versions freely or as part of commercial offerings, provided that the original model identity is preserved and clearly attributed. Removal or misrepresentation of the model’s origin is not permitted.
Redistributions must include this documentation and full attribution to Puchify Inc. For further details, see LICENSE.txt.
Citation
If you use Puchify T2 Pro-8B in your work, please cite:
@misc{puchify2025t2-pro-8b,
title = {Puchify T2 Pro-8B: Advanced Safe & Reasoning-centric Hybrid Model},
author = {Puchify Inc.},
year = {2025},
url = {https://puchify.ai/models/t2-pro-8b}
}
- Downloads last month
- 1