|
--- |
|
|
|
base_model: Qwen/Qwen3-32B |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- qwen3 |
|
- fast-reasoning |
|
- efficient-llm |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
 |
|
|
|
# π₯ Phoenix β Fast Reasoning Qwen3-32B |
|
|
|
**Model Name:** `Daemontatox/Phoenix` |
|
**Developed by:** `Daemontatox` |
|
**License:** `Apache-2.0` |
|
**Base Model:** [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b) |
|
**Training Stack:** [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl) |
|
|
|
--- |
|
|
|
## β‘ What is Phoenix? |
|
|
|
**Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical. |
|
|
|
--- |
|
|
|
## β
Key Features |
|
|
|
- π **2Γ faster training** with Unsloth |
|
- β±οΈ **Reduced token latency** without compromising answer quality |
|
- π― Tuned for **instruction-following and reasoning clarity** |
|
- π§± Works with `transformers`, `TGI`, and `Hugging Face Inference API` |
|
|
|
--- |
|
|
|
## π§ͺ Inference Code (Transformers) |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
model_name = "Daemontatox/Phoenix" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
trust_remote_code=True |
|
) |
|
|
|
prompt = "Explain the concept of emergence in complex systems in simple terms." |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7) |
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
π Inference via Hugging Face API |
|
```python |
|
import requests |
|
|
|
API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix" |
|
headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"} |
|
|
|
data = { |
|
"inputs": "Explain the concept of emergence in complex systems in simple terms.", |
|
"parameters": { |
|
"temperature": 0.7, |
|
"max_new_tokens": 150 |
|
} |
|
} |
|
``` |
|
|
|
response = requests.post(API_URL, headers=headers, json=data) |
|
print(response.json()[0]["generated_text"]) |
|
|
|
> β οΈ Replace YOUR_HF_API_TOKEN with your Hugging Face access token. |
|
|
|
|
|
|
|
|
|
--- |
|
|
|
π§ Sample Output |
|
|
|
Prompt: |
|
|
|
> "Explain the concept of emergence in complex systems in simple terms." |
|
|
|
|
|
|
|
Output (Phoenix): |
|
|
|
> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior." |
|
|
|
|
|
|
|
|
|
--- |
|
|
|
π Known Limitations |
|
|
|
Large VRAM required for local inference (~64GB+) |
|
|
|
Not tuned for multilingual inputs |
|
|
|
May not perform well on long-form CoT problems requiring step-wise thought |
|
|
|
|
|
|
|
--- |
|
|
|
π Citation |
|
|
|
@misc{daemontatox2025phoenix, |
|
title={Phoenix: Fast Reasoning Qwen3-32B Finetune}, |
|
author={Daemontatox}, |
|
year={2025}, |
|
note={Trained with Unsloth and Huggingface TRL}, |
|
url={https://huggingface.co/Daemontatox/Phoenix} |
|
} |
|
|
|
|
|
--- |
|
|
|
|