---

base_model: Qwen/Qwen3-32B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
- fast-reasoning
- efficient-llm
license: apache-2.0
language:
- en
library_name: transformers
---
![image](./image.jpg)

# 🔥 Phoenix — Fast Reasoning Qwen3-32B

**Model Name:** `Daemontatox/Phoenix`  
**Developed by:** `Daemontatox`  
**License:** `Apache-2.0`  
**Base Model:** [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b)  
**Training Stack:** [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl)

---

## ⚡ What is Phoenix?

**Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical.

---

## ✅ Key Features

- 🔁 **2× faster training** with Unsloth  
- ⏱️ **Reduced token latency** without compromising answer quality  
- 🎯 Tuned for **instruction-following and reasoning clarity**  
- 🧱 Works with `transformers`, `TGI`, and `Hugging Face Inference API`

---

## 🧪 Inference Code (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Daemontatox/Phoenix"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Explain the concept of emergence in complex systems in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

🌐 Inference via Hugging Face API
```python
import requests

API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}

data = {
  "inputs": "Explain the concept of emergence in complex systems in simple terms.",
  "parameters": {
    "temperature": 0.7,
    "max_new_tokens": 150
  }
}
```

response = requests.post(API_URL, headers=headers, json=data)
print(response.json()[0]["generated_text"])

> ⚠️ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.


---

🧠 Sample Output

Prompt:

> "Explain the concept of emergence in complex systems in simple terms."


Output (Phoenix):

> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."


---

📉 Known Limitations

Large VRAM required for local inference (~64GB+)

Not tuned for multilingual inputs

May not perform well on long-form CoT problems requiring step-wise thought


---

📄 Citation

@misc{daemontatox2025phoenix,
  title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
  author={Daemontatox},
  year={2025},
  note={Trained with Unsloth and Huggingface TRL},
  url={https://huggingface.co/Daemontatox/Phoenix}
}


---