Daemontatox
/

Phoenix

@@ -13,10 +13,9 @@ language:
 - en
 library_name: transformers
 ---
 ![image](./image.jpg)
-# 🔥 Daemontatox/Phoenix — Fast Reasoning Qwen3-32B
 **Model Name:** `Daemontatox/Phoenix`
 **Developed by:** `Daemontatox`
@@ -28,58 +27,101 @@ library_name: transformers
 ## ⚡ What is Phoenix?
-**Phoenix** is an optimized variant of Qwen3-32B designed to **think less, answer faster**, and **maintain equal accuracy**.
-Finetuned for **low-latency, high-clarity generation**, Phoenix cuts down on verbose chains of thought and delivers **high-quality outputs with fewer tokens** — ideal for real-time AI agents, chatbots, and production inference.
-> 🧠 *Same quality. Less thinking. Faster decisions.*
 ---
 ## ✅ Key Features
-- 🔄 **2× faster training** using Unsloth optimizations
-- ⚡ **Low token-latency reasoning** — minimized “thought preamble”
-- 🧪 **Instruction-tuned** for direct, correct, high-quality completions
-- 🧱 Works seamlessly with TGI, Transformers, and most inference stacks
 ---
-## 🛠️ Finetuning Details
-- **Base**: `unsloth/qwen3-32b`
-- **Objective**: Reduce token bloat in reasoning tasks
-- **Method**: TRL + Unsloth + curated instruction/reasoning dataset
-- **Frameworks**: PyTorch, TRL, Unsloth
 ---
-## 🧠 Intended Use
-Phoenix is ideal for:
-- 🕹️ Agentic LLM systems
-- 💬 High-performance chat interfaces
-- 📉 Token-optimized inference environments
-- 🔍 Real-time reasoning pipelines
-- 🧠 Cognitive task simulations
 ---
-## 📉 Limitations
-- Still resource-intensive (32B scale)
-- Primary training language: English
-- May sacrifice detail in long-form chain-of-thought explanations
 ---
 ---
-## 📄 Citation
-```bibtex
 @misc{daemontatox2025phoenix,
   title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
   author={Daemontatox},

 - en
 library_name: transformers
 ---
 ![image](./image.jpg)
+# 🔥 Phoenix — Fast Reasoning Qwen3-32B
 **Model Name:** `Daemontatox/Phoenix`
 **Developed by:** `Daemontatox`
 ## ⚡ What is Phoenix?
+**Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical.
 ---
 ## ✅ Key Features
+- 🔁 **2× faster training** with Unsloth
+- ⏱️ **Reduced token latency** without compromising answer quality
+- 🎯 Tuned for **instruction-following and reasoning clarity**
+- 🧱 Works with `transformers`, `TGI`, and `Hugging Face Inference API`
 ---
+## 🧪 Inference Code (Transformers)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_name = "Daemontatox/Phoenix"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+prompt = "Explain the concept of emergence in complex systems in simple terms."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
 ---
+🌐 Inference via Hugging Face API
+```python
+import requests
+API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
+headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}
+data = {
+  "inputs": "Explain the concept of emergence in complex systems in simple terms.",
+  "parameters": {
+    "temperature": 0.7,
+    "max_new_tokens": 150
+  }
+}
+```
+response = requests.post(API_URL, headers=headers, json=data)
+print(response.json()[0]["generated_text"])
+> ⚠️ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.
 ---
+🧠 Sample Output
+Prompt:
+> "Explain the concept of emergence in complex systems in simple terms."
+Output (Phoenix):
+> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."
 ---
+📉 Known Limitations
+Large VRAM required for local inference (~64GB+)
+Not tuned for multilingual inputs
+May not perform well on long-form CoT problems requiring step-wise thought
 ---
+📄 Citation
 @misc{daemontatox2025phoenix,
   title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
   author={Daemontatox},