File size: 3,178 Bytes
e6c1aeb 0781fd5 3b40930 e6c1aeb 0781fd5 e6c1aeb 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 e6c1aeb 5b793f4 e6c1aeb 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 5b793f4 0781fd5 e6c1aeb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
---
base_model: Qwen/Qwen3-32B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
- fast-reasoning
- efficient-llm
license: apache-2.0
language:
- en
library_name: transformers
---

# π₯ Phoenix β Fast Reasoning Qwen3-32B
**Model Name:** `Daemontatox/Phoenix`
**Developed by:** `Daemontatox`
**License:** `Apache-2.0`
**Base Model:** [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b)
**Training Stack:** [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl)
---
## β‘ What is Phoenix?
**Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical.
---
## β
Key Features
- π **2Γ faster training** with Unsloth
- β±οΈ **Reduced token latency** without compromising answer quality
- π― Tuned for **instruction-following and reasoning clarity**
- π§± Works with `transformers`, `TGI`, and `Hugging Face Inference API`
---
## π§ͺ Inference Code (Transformers)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "Daemontatox/Phoenix"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "Explain the concept of emergence in complex systems in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
π Inference via Hugging Face API
```python
import requests
API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}
data = {
"inputs": "Explain the concept of emergence in complex systems in simple terms.",
"parameters": {
"temperature": 0.7,
"max_new_tokens": 150
}
}
```
response = requests.post(API_URL, headers=headers, json=data)
print(response.json()[0]["generated_text"])
> β οΈ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.
---
π§ Sample Output
Prompt:
> "Explain the concept of emergence in complex systems in simple terms."
Output (Phoenix):
> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."
---
π Known Limitations
Large VRAM required for local inference (~64GB+)
Not tuned for multilingual inputs
May not perform well on long-form CoT problems requiring step-wise thought
---
π Citation
@misc{daemontatox2025phoenix,
title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
author={Daemontatox},
year={2025},
note={Trained with Unsloth and Huggingface TRL},
url={https://huggingface.co/Daemontatox/Phoenix}
}
---
|