--- base_model: Qwen/Qwen3-32B tags: - text-generation-inference - transformers - unsloth - qwen3 - fast-reasoning - efficient-llm license: apache-2.0 language: - en library_name: transformers --- ![image](./image.jpg) # ๐Ÿ”ฅ Phoenix โ€” Fast Reasoning Qwen3-32B **Model Name:** `Daemontatox/Phoenix` **Developed by:** `Daemontatox` **License:** `Apache-2.0` **Base Model:** [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b) **Training Stack:** [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl) --- ## โšก What is Phoenix? **Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical. --- ## โœ… Key Features - ๐Ÿ” **2ร— faster training** with Unsloth - โฑ๏ธ **Reduced token latency** without compromising answer quality - ๐ŸŽฏ Tuned for **instruction-following and reasoning clarity** - ๐Ÿงฑ Works with `transformers`, `TGI`, and `Hugging Face Inference API` --- ## ๐Ÿงช Inference Code (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_name = "Daemontatox/Phoenix" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) prompt = "Explain the concept of emergence in complex systems in simple terms." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ๐ŸŒ Inference via Hugging Face API ```python import requests API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix" headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"} data = { "inputs": "Explain the concept of emergence in complex systems in simple terms.", "parameters": { "temperature": 0.7, "max_new_tokens": 150 } } ``` response = requests.post(API_URL, headers=headers, json=data) print(response.json()[0]["generated_text"]) > โš ๏ธ Replace YOUR_HF_API_TOKEN with your Hugging Face access token. --- ๐Ÿง  Sample Output Prompt: > "Explain the concept of emergence in complex systems in simple terms." Output (Phoenix): > "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior." --- ๐Ÿ“‰ Known Limitations Large VRAM required for local inference (~64GB+) Not tuned for multilingual inputs May not perform well on long-form CoT problems requiring step-wise thought --- ๐Ÿ“„ Citation @misc{daemontatox2025phoenix, title={Phoenix: Fast Reasoning Qwen3-32B Finetune}, author={Daemontatox}, year={2025}, note={Trained with Unsloth and Huggingface TRL}, url={https://huggingface.co/Daemontatox/Phoenix} } ---