Daemontatox commited on
Commit
5b793f4
Β·
verified Β·
1 Parent(s): f48b32e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -29
README.md CHANGED
@@ -13,10 +13,9 @@ language:
13
  - en
14
  library_name: transformers
15
  ---
16
-
17
  ![image](./image.jpg)
18
 
19
- # πŸ”₯ Daemontatox/Phoenix β€” Fast Reasoning Qwen3-32B
20
 
21
  **Model Name:** `Daemontatox/Phoenix`
22
  **Developed by:** `Daemontatox`
@@ -28,58 +27,101 @@ library_name: transformers
28
 
29
  ## ⚑ What is Phoenix?
30
 
31
- **Phoenix** is an optimized variant of Qwen3-32B designed to **think less, answer faster**, and **maintain equal accuracy**.
32
-
33
- Finetuned for **low-latency, high-clarity generation**, Phoenix cuts down on verbose chains of thought and delivers **high-quality outputs with fewer tokens** β€” ideal for real-time AI agents, chatbots, and production inference.
34
-
35
- > 🧠 *Same quality. Less thinking. Faster decisions.*
36
 
37
  ---
38
 
39
  ## βœ… Key Features
40
 
41
- - πŸ”„ **2Γ— faster training** using Unsloth optimizations
42
- - ⚑ **Low token-latency reasoning** β€” minimized β€œthought preamble”
43
- - πŸ§ͺ **Instruction-tuned** for direct, correct, high-quality completions
44
- - 🧱 Works seamlessly with TGI, Transformers, and most inference stacks
45
 
46
  ---
47
 
48
- ## πŸ› οΈ Finetuning Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
- - **Base**: `unsloth/qwen3-32b`
51
- - **Objective**: Reduce token bloat in reasoning tasks
52
- - **Method**: TRL + Unsloth + curated instruction/reasoning dataset
53
- - **Frameworks**: PyTorch, TRL, Unsloth
54
 
55
  ---
56
 
57
- ## 🧠 Intended Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
- Phoenix is ideal for:
60
 
61
- - πŸ•ΉοΈ Agentic LLM systems
62
- - πŸ’¬ High-performance chat interfaces
63
- - πŸ“‰ Token-optimized inference environments
64
- - πŸ” Real-time reasoning pipelines
65
- - 🧠 Cognitive task simulations
66
 
67
  ---
68
 
69
- ## πŸ“‰ Limitations
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
- - Still resource-intensive (32B scale)
72
- - Primary training language: English
73
- - May sacrifice detail in long-form chain-of-thought explanations
74
 
75
  ---
76
 
 
 
 
 
 
 
 
 
 
77
 
78
  ---
79
 
80
- ## πŸ“„ Citation
81
 
82
- ```bibtex
83
  @misc{daemontatox2025phoenix,
84
  title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
85
  author={Daemontatox},
 
13
  - en
14
  library_name: transformers
15
  ---
 
16
  ![image](./image.jpg)
17
 
18
+ # πŸ”₯ Phoenix β€” Fast Reasoning Qwen3-32B
19
 
20
  **Model Name:** `Daemontatox/Phoenix`
21
  **Developed by:** `Daemontatox`
 
27
 
28
  ## ⚑ What is Phoenix?
29
 
30
+ **Phoenix** is a finetuned Qwen3-32B model designed for **rapid reasoning**, **low-token verbosity**, and **high-quality results**. Ideal for chat agents, reasoning backends, and any application where **speed and precision** are critical.
 
 
 
 
31
 
32
  ---
33
 
34
  ## βœ… Key Features
35
 
36
+ - πŸ” **2Γ— faster training** with Unsloth
37
+ - ⏱️ **Reduced token latency** without compromising answer quality
38
+ - 🎯 Tuned for **instruction-following and reasoning clarity**
39
+ - 🧱 Works with `transformers`, `TGI`, and `Hugging Face Inference API`
40
 
41
  ---
42
 
43
+ ## πŸ§ͺ Inference Code (Transformers)
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+ import torch
48
+
49
+ model_name = "Daemontatox/Phoenix"
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ model_name,
54
+ torch_dtype=torch.bfloat16,
55
+ device_map="auto",
56
+ trust_remote_code=True
57
+ )
58
+
59
+ prompt = "Explain the concept of emergence in complex systems in simple terms."
60
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
61
+ outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
62
 
63
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
64
+ ```
 
 
65
 
66
  ---
67
 
68
+ 🌐 Inference via Hugging Face API
69
+ ```python
70
+ import requests
71
+
72
+ API_URL = "https://api-inference.huggingface.co/models/Daemontatox/Phoenix"
73
+ headers = {"Authorization": "Bearer YOUR_HF_API_TOKEN"}
74
+
75
+ data = {
76
+ "inputs": "Explain the concept of emergence in complex systems in simple terms.",
77
+ "parameters": {
78
+ "temperature": 0.7,
79
+ "max_new_tokens": 150
80
+ }
81
+ }
82
+ ```
83
+
84
+ response = requests.post(API_URL, headers=headers, json=data)
85
+ print(response.json()[0]["generated_text"])
86
+
87
+ > ⚠️ Replace YOUR_HF_API_TOKEN with your Hugging Face access token.
88
+
89
 
 
90
 
 
 
 
 
 
91
 
92
  ---
93
 
94
+ 🧠 Sample Output
95
+
96
+ Prompt:
97
+
98
+ > "Explain the concept of emergence in complex systems in simple terms."
99
+
100
+
101
+
102
+ Output (Phoenix):
103
+
104
+ > "Emergence is when many simple parts work together and create something more complex. For example, birds flying in a flock follow simple rules, but the group moves like one unit. That larger pattern 'emerges' from simple behavior."
105
+
106
+
107
 
 
 
 
108
 
109
  ---
110
 
111
+ πŸ“‰ Known Limitations
112
+
113
+ Large VRAM required for local inference (~64GB+)
114
+
115
+ Not tuned for multilingual inputs
116
+
117
+ May not perform well on long-form CoT problems requiring step-wise thought
118
+
119
+
120
 
121
  ---
122
 
123
+ πŸ“„ Citation
124
 
 
125
  @misc{daemontatox2025phoenix,
126
  title={Phoenix: Fast Reasoning Qwen3-32B Finetune},
127
  author={Daemontatox},