Daemontatox
/

Phoenix

Text Generation

text-generation-inference

Model card Files Files and versions

Phoenix / README.md

Daemontatox's picture

Update README.md

3b40930 verified 5 days ago

|

history blame contribute delete

3.18 kB

	---

	base_model: Qwen/Qwen3-32B
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3
	- fast-reasoning
	- efficient-llm
	license: apache-2.0
	language:
	- en
	library_name: transformers
	---
	![image](./image.jpg)

	# 🔥 Phoenix — Fast Reasoning Qwen3-32B

	Model Name: `Daemontatox/Phoenix`
	Developed by: `Daemontatox`
	License: `Apache-2.0`
	Base Model: [`unsloth/qwen3-32b`](https://huggingface.co/unsloth/qwen3-32b)
	Training Stack: [Unsloth](https://github.com/unslothai/unsloth) + Huggingface [`TRL`](https://github.com/huggingface/trl)

	---

	## ⚡ What is Phoenix?

	Phoenix is a finetuned Qwen3-32B model designed for rapid reasoning, low-token verbosity, and high-quality results. Ideal for chat agents, reasoning backends, and any application where speed and precision are critical.

	---

	## ✅ Key Features

	- 🔁 2× faster training with Unsloth
	- ⏱️ Reduced token latency without compromising answer quality
	- 🎯 Tuned for instruction-following and reasoning clarity
	- 🧱 Works with `transformers`, `TGI`, and `Hugging Face Inference API`

	---

	## 🧪 Inference Code (Transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_name = "Daemontatox/Phoenix"

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)

	prompt = "Explain the concept of emergence in complex systems in simple terms."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	🌐 Inference via Hugging Face API
	```python
	import requests

	API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
	headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}

	data = {
	"inputs": "Explain the concept of emergence in complex systems in simple terms.",
	"parameters": {
	"temperature": 0.7,
	"max_new_tokens": 150
	}
	}
	```

	response = requests.post(API_URL, headers=headers, json=data)
	print(response.json()[0]["generated_text"])

	> ⚠️ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.




	---

	🧠 Sample Output

	Prompt:

	> "Explain the concept of emergence in complex systems in simple terms."



	Output (Phoenix):

	> "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."




	---

	📉 Known Limitations

	Large VRAM required for local inference (~64GB+)

	Not tuned for multilingual inputs

	May not perform well on long-form CoT problems requiring step-wise thought



	---

	📄 Citation

	@misc{daemontatox2025phoenix,
	title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
	author={Daemontatox},
	year={2025},
	note={Trained with Unsloth and Huggingface TRL},
	url={https://huggingface.co/Daemontatox/Phoenix}
	}


	---