Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,173 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- GetSoloTech/Code-Reasoning
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
base_model:
|
8 |
+
- GetSoloTech/Qwen3-Code-Reasoning-4B
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
---
|
11 |
+
|
12 |
+
# GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF
|
13 |
+
|
14 |
+
This is the GGUF quantized version of the [Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.
|
15 |
+
|
16 |
+
|
17 |
+
## π Key Features
|
18 |
+
|
19 |
+
* **Enhanced Code Reasoning**: Specifically trained on competitive programming problems
|
20 |
+
* **Thinking Capabilities**: Inherits the advanced reasoning capabilities from the base model
|
21 |
+
* **High-Quality Solutions**: Trained on solutions with β₯85% test case pass rates
|
22 |
+
* **Structured Output**: Optimized for generating well-reasoned programming solutions
|
23 |
+
* **Efficient Inference**: GGUF format enables fast inference on CPU and GPU
|
24 |
+
* **Multiple Quantization Levels**: Available in various precision levels for different hardware requirements
|
25 |
+
|
26 |
+
### Dataset Statistics
|
27 |
+
|
28 |
+
* **Split**: Python
|
29 |
+
* **Source**: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces
|
30 |
+
* **Quality Filter**: Only correctly solved problems with β₯85% test case pass rates
|
31 |
+
|
32 |
+
## π§ Usage
|
33 |
+
|
34 |
+
### Using with llama.cpp
|
35 |
+
|
36 |
+
```bash
|
37 |
+
# Download the model (choose your preferred quantization)
|
38 |
+
wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf
|
39 |
+
|
40 |
+
# Run inference
|
41 |
+
./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..."
|
42 |
+
```
|
43 |
+
|
44 |
+
### Using with Python (llama-cpp-python)
|
45 |
+
|
46 |
+
```python
|
47 |
+
from llama_cpp import Llama
|
48 |
+
|
49 |
+
# Load the model
|
50 |
+
llm = Llama(
|
51 |
+
model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf",
|
52 |
+
n_ctx=4096,
|
53 |
+
n_threads=4
|
54 |
+
)
|
55 |
+
|
56 |
+
# Prepare input for competitive programming problem
|
57 |
+
prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
|
58 |
+
|
59 |
+
Problem: Your programming problem here..."""
|
60 |
+
|
61 |
+
# Generate solution
|
62 |
+
output = llm(
|
63 |
+
prompt,
|
64 |
+
max_tokens=4096,
|
65 |
+
temperature=0.7,
|
66 |
+
top_p=0.8,
|
67 |
+
top_k=20,
|
68 |
+
repeat_penalty=1.1
|
69 |
+
)
|
70 |
+
|
71 |
+
print(output['choices'][0]['text'])
|
72 |
+
```
|
73 |
+
|
74 |
+
### Using with Ollama
|
75 |
+
|
76 |
+
```bash
|
77 |
+
# Create a Modelfile
|
78 |
+
cat > Modelfile << EOF
|
79 |
+
FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf
|
80 |
+
TEMPLATE """{{ if .System }}<|im_start|>system
|
81 |
+
{{ .System }}<|im_end|>
|
82 |
+
{{ end }}{{ if .Prompt }}<|im_start|>user
|
83 |
+
{{ .Prompt }}<|im_end|>
|
84 |
+
{{ end }}<|im_start|>assistant
|
85 |
+
"""
|
86 |
+
PARAMETER temperature 0.7
|
87 |
+
PARAMETER top_p 0.8
|
88 |
+
PARAMETER top_k 20
|
89 |
+
PARAMETER repeat_penalty 1.1
|
90 |
+
EOF
|
91 |
+
|
92 |
+
# Create and run the model
|
93 |
+
ollama create qwen3-code-reasoning -f Modelfile
|
94 |
+
ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]"
|
95 |
+
```
|
96 |
+
|
97 |
+
## π Available Quantizations
|
98 |
+
|
99 |
+
| Quantization | Size | Memory Usage | Quality | Use Case |
|
100 |
+
|--------------|------|--------------|---------|----------|
|
101 |
+
| Q3_K_M | 2.08 GB | ~3 GB | Good | CPU inference, limited memory |
|
102 |
+
| Q4_K_M | 2.5 GB | ~4 GB | Better | Balanced performance/memory |
|
103 |
+
| Q5_K_M | 2.89 GB | ~5 GB | Very Good | High quality, moderate memory |
|
104 |
+
| Q6_K | 3.31 GB | ~6 GB | Excellent | High quality, more memory |
|
105 |
+
| Q8_0 | 4.28 GB | ~8 GB | Best | Maximum quality, high memory |
|
106 |
+
| F16 | 8.05 GB | ~16 GB | Original | Maximum quality, GPU recommended |
|
107 |
+
|
108 |
+
## π Performance Expectations
|
109 |
+
|
110 |
+
This GGUF quantized model maintains the performance characteristics of the original finetuned model:
|
111 |
+
|
112 |
+
* **Competitive Programming Problems**: Better understanding of problem constraints and requirements
|
113 |
+
* **Code Generation**: More accurate and efficient solutions
|
114 |
+
* **Reasoning Quality**: Enhanced step-by-step reasoning for complex problems
|
115 |
+
* **Solution Completeness**: More comprehensive solutions with proper edge case handling
|
116 |
+
|
117 |
+
## ποΈ Recommended Settings
|
118 |
+
|
119 |
+
### For Code Generation
|
120 |
+
|
121 |
+
* **Temperature**: 0.7
|
122 |
+
* **Top-p**: 0.8
|
123 |
+
* **Top-k**: 20
|
124 |
+
* **Max New Tokens**: 4096 (adjust based on problem complexity)
|
125 |
+
* **Repeat Penalty**: 1.1
|
126 |
+
|
127 |
+
### For Reasoning Tasks
|
128 |
+
|
129 |
+
* **Temperature**: 0.6
|
130 |
+
* **Top-p**: 0.95
|
131 |
+
* **Top-k**: 20
|
132 |
+
* **Max New Tokens**: 8192 (for complex reasoning)
|
133 |
+
* **Repeat Penalty**: 1.1
|
134 |
+
|
135 |
+
## π οΈ Hardware Requirements
|
136 |
+
|
137 |
+
### Minimum Requirements
|
138 |
+
* **RAM**: 4 GB (for Q3_K_M quantization)
|
139 |
+
* **Storage**: 2.5 GB free space
|
140 |
+
* **CPU**: Multi-core processor recommended
|
141 |
+
|
142 |
+
### Recommended Requirements
|
143 |
+
* **RAM**: 8 GB or more
|
144 |
+
* **Storage**: 5 GB free space
|
145 |
+
* **GPU**: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference)
|
146 |
+
|
147 |
+
## π€ Contributing
|
148 |
+
|
149 |
+
This GGUF model was converted from the original LoRA-finetuned model. For questions about:
|
150 |
+
|
151 |
+
* The original model: [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
|
152 |
+
* The base model: [Qwen3 GitHub](https://github.com/QwenLM/Qwen3)
|
153 |
+
* The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
|
154 |
+
* The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)
|
155 |
+
|
156 |
+
## π License
|
157 |
+
|
158 |
+
This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details.
|
159 |
+
|
160 |
+
## π Acknowledgments
|
161 |
+
|
162 |
+
* **Qwen Team** for the excellent base model
|
163 |
+
* **Unsloth Team** for the efficient training framework
|
164 |
+
* **NVIDIA Research** for the original OpenCodeReasoning-2 dataset
|
165 |
+
* **llama.cpp community** for the GGUF format and tools
|
166 |
+
|
167 |
+
## π Contact
|
168 |
+
|
169 |
+
For questions about this GGUF model, please open an issue in the repository.
|
170 |
+
|
171 |
+
---
|
172 |
+
|
173 |
+
**Note**: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.
|