GetSoloTech
/

Qwen3-Code-Reasoning-4B-GGUF

+---
+license: apache-2.0
+datasets:
+- GetSoloTech/Code-Reasoning
+language:
+- en
+base_model:
+- GetSoloTech/Qwen3-Code-Reasoning-4B
+pipeline_tag: text-generation
+---
+# GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF
+This is the GGUF quantized version of the [Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B) model, specifically optimized for competitive programming and code reasoning tasks. This model has been trained on the high-quality Code-Reasoning dataset to enhance its capabilities in solving complex programming problems with detailed reasoning.
+## 🚀 Key Features
+* **Enhanced Code Reasoning**: Specifically trained on competitive programming problems
+* **Thinking Capabilities**: Inherits the advanced reasoning capabilities from the base model
+* **High-Quality Solutions**: Trained on solutions with ≥85% test case pass rates
+* **Structured Output**: Optimized for generating well-reasoned programming solutions
+* **Efficient Inference**: GGUF format enables fast inference on CPU and GPU
+* **Multiple Quantization Levels**: Available in various precision levels for different hardware requirements
+### Dataset Statistics
+* **Split**: Python
+* **Source**: High-quality competitive programming problems from TACO, APPS, CodeContests, and Codeforces
+* **Quality Filter**: Only correctly solved problems with ≥85% test case pass rates
+## 🔧 Usage
+### Using with llama.cpp
+```bash
+# Download the model (choose your preferred quantization)
+wget https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B-GGUF/resolve/main/qwen3-code-reasoning-4b.Q4_K_M.gguf
+# Run inference
+./llama.cpp -m qwen3-code-reasoning-4b.Q4_K_M.gguf -n 4096 --repeat_penalty 1.1 -p "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.\n\nProblem: Your programming problem here..."
+```
+### Using with Python (llama-cpp-python)
+```python
+from llama_cpp import Llama
+# Load the model
+llm = Llama(
+    model_path="./qwen3-code-reasoning-4b.Q4_K_M.gguf",
+    n_ctx=4096,
+    n_threads=4
+)
+# Prepare input for competitive programming problem
+prompt = """You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful.
+Problem: Your programming problem here..."""
+# Generate solution
+output = llm(
+    prompt,
+    max_tokens=4096,
+    temperature=0.7,
+    top_p=0.8,
+    top_k=20,
+    repeat_penalty=1.1
+)
+print(output['choices'][0]['text'])
+```
+### Using with Ollama
+```bash
+# Create a Modelfile
+cat > Modelfile << EOF
+FROM ./qwen3-code-reasoning-4b.Q4_K_M.gguf
+TEMPLATE """{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>
+{{ end }}{{ if .Prompt }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+{{ end }}<|im_start|>assistant
+"""
+PARAMETER temperature 0.7
+PARAMETER top_p 0.8
+PARAMETER top_k 20
+PARAMETER repeat_penalty 1.1
+EOF
+# Create and run the model
+ollama create qwen3-code-reasoning -f Modelfile
+ollama run qwen3-code-reasoning "Solve this competitive programming problem: [your problem here]"
+```
+## 📊 Available Quantizations
+| Quantization | Size | Memory Usage | Quality | Use Case |
+|--------------|------|--------------|---------|----------|
+| Q3_K_M | 2.08 GB | ~3 GB | Good | CPU inference, limited memory |
+| Q4_K_M | 2.5 GB | ~4 GB | Better | Balanced performance/memory |
+| Q5_K_M | 2.89 GB | ~5 GB | Very Good | High quality, moderate memory |
+| Q6_K | 3.31 GB | ~6 GB | Excellent | High quality, more memory |
+| Q8_0 | 4.28 GB | ~8 GB | Best | Maximum quality, high memory |
+| F16 | 8.05 GB | ~16 GB | Original | Maximum quality, GPU recommended |
+## 📈 Performance Expectations
+This GGUF quantized model maintains the performance characteristics of the original finetuned model:
+* **Competitive Programming Problems**: Better understanding of problem constraints and requirements
+* **Code Generation**: More accurate and efficient solutions
+* **Reasoning Quality**: Enhanced step-by-step reasoning for complex problems
+* **Solution Completeness**: More comprehensive solutions with proper edge case handling
+## 🎛️ Recommended Settings
+### For Code Generation
+* **Temperature**: 0.7
+* **Top-p**: 0.8
+* **Top-k**: 20
+* **Max New Tokens**: 4096 (adjust based on problem complexity)
+* **Repeat Penalty**: 1.1
+### For Reasoning Tasks
+* **Temperature**: 0.6
+* **Top-p**: 0.95
+* **Top-k**: 20
+* **Max New Tokens**: 8192 (for complex reasoning)
+* **Repeat Penalty**: 1.1
+## 🛠️ Hardware Requirements
+### Minimum Requirements
+* **RAM**: 4 GB (for Q3_K_M quantization)
+* **Storage**: 2.5 GB free space
+* **CPU**: Multi-core processor recommended
+### Recommended Requirements
+* **RAM**: 8 GB or more
+* **Storage**: 5 GB free space
+* **GPU**: NVIDIA GPU with 4GB+ VRAM (optional, for faster inference)
+## 🤝 Contributing
+This GGUF model was converted from the original LoRA-finetuned model. For questions about:
+* The original model: [GetSoloTech/Qwen3-Code-Reasoning-4B](https://huggingface.co/GetSoloTech/Qwen3-Code-Reasoning-4B)
+* The base model: [Qwen3 GitHub](https://github.com/QwenLM/Qwen3)
+* The training dataset: [Code-Reasoning Repository](https://huggingface.co/datasets/GetSoloTech/Code-Reasoning)
+* The training framework: [Unsloth Documentation](https://github.com/unslothai/unsloth)
+## 📄 License
+This model follows the same license as the base model (Apache 2.0). Please refer to the base model license for details.
+## 🙏 Acknowledgments
+* **Qwen Team** for the excellent base model
+* **Unsloth Team** for the efficient training framework
+* **NVIDIA Research** for the original OpenCodeReasoning-2 dataset
+* **llama.cpp community** for the GGUF format and tools
+## 📞 Contact
+For questions about this GGUF model, please open an issue in the repository.
+---
+**Note**: This model is specifically optimized for competitive programming and code reasoning tasks. The GGUF format enables efficient inference on various hardware configurations while maintaining the model's reasoning capabilities.