File size: 3,178 Bytes
e6c1aeb
0781fd5
3b40930
e6c1aeb
 
 
 
 
0781fd5
 
e6c1aeb
 
 
0781fd5
 
 
 
5b793f4
0781fd5
 
 
 
 
 
 
 
 
 
 
5b793f4
0781fd5
 
 
 
 
5b793f4
 
 
 
0781fd5
 
 
5b793f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0781fd5
5b793f4
 
0781fd5
e6c1aeb
 
5b793f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e6c1aeb
 
0781fd5
 
 
5b793f4
 
 
 
 
 
 
 
 
 
 
 
 
0781fd5
 
 
 
5b793f4
 
 
 
 
 
 
 
 
0781fd5
 
 
5b793f4
0781fd5
 
 
 
 
 
 
 
 
 
 
e6c1aeb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---

base_model: Qwen/Qwen3-32B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
- fast-reasoning
- efficient-llm
license: apache-2.0
language:
- en
library_name: transformers
---
![image](./image.jpg)

# πŸ”₯ Phoenix β€” Fast Reasoning Qwen3-32B

**Model Name:** `Daemontatox/Phoenix`  
**Developed by:** `Daemontatox`  
**License:** `Apache-2.0`  
**Base Model:** [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b)  
**Training Stack:** [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl)

---

## ⚑ What is Phoenix?

**Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical.

---

## βœ… Key Features

- πŸ” **2Γ— faster training** with Unsloth  
- ⏱️ **Reduced token latency** without compromising answer quality  
- 🎯 Tuned for **instruction-following and reasoning clarity**  
- 🧱 Works with `transformers`, `TGI`, and `Hugging Face Inference API`

---

## πŸ§ͺ Inference Code (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "Daemontatox/Phoenix"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Explain the concept of emergence in complex systems in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

🌐 Inference via Hugging Face API
```python
import requests

API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}

data = {
  "inputs": "Explain the concept of emergence in complex systems in simple terms.",
  "parameters": {
    "temperature": 0.7,
    "max_new_tokens": 150
  }
}
```

response = requests.post(API_URL, headers=headers, json=data)
print(response.json()[0]["generated_text"])

> ⚠️ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.




---

🧠 Sample Output

Prompt:

> "Explain the concept of emergence in complex systems in simple terms."



Output (Phoenix):

> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."




---

πŸ“‰ Known Limitations

Large VRAM required for local inference (~64GB+)

Not tuned for multilingual inputs

May not perform well on long-form CoT problems requiring step-wise thought



---

πŸ“„ Citation

@misc{daemontatox2025phoenix,
  title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
  author={Daemontatox},
  year={2025},
  note={Trained with Unsloth and Huggingface TRL},
  url={https://huggingface.co/Daemontatox/Phoenix}
}


---