
🧠 Crystal Think V2 - GGUF Quantized ✨
Optimized GGUF Quantizations for Efficient Mathematical Reasoning
🔗 Original Model: PinkPixel/Crystal-Think-V2
📦 Quantized by: Pink Pixel
🏷️ License: Apache 2.0
📋 About This Repository
This repository contains GGUF quantized versions of Crystal Think V2, an advanced mathematical reasoning model based on Qwen3-4B. These quantized versions are optimized for efficient inference while maintaining excellent mathematical reasoning capabilities.
🎯 Original Model Features
- 🧮 Advanced Mathematical Reasoning with enhanced chain-of-thought
- 📐 Multi-step Problem Solving with clear explanations
- 💻 Mathematical Code Generation and algorithm explanation
- 🎯 Enhanced
<think></think>
Reasoning Format - 📊 85.2% GSM8K accuracy (+8.8% over base Qwen3-4B)
📦 Available Quantizations
Quantization | File Size | Use Case | Memory Required | Quality |
---|---|---|---|---|
Q4_K_M | 2.3GB | Balanced efficiency | ~6GB RAM | Good |
Q5_K_M | 2.7GB | Better quality | ~7GB RAM | Very Good |
Q6_K | 3.1GB | High quality | ~8GB RAM | Excellent |
Q8_0 | 4.0GB | Maximum quality | ~10GB RAM | Near-Original |
💡 Quantization Guide:
- Q4_K_M - Best for limited hardware, good performance
- Q5_K_M - Recommended balance of speed and quality
- Q6_K - High quality with reasonable speed
- Q8_0 - Near-original quality, slower inference
🚀 Quick Start
Using llama.cpp
# Download your preferred quantization
wget https://huggingface.co/PinkPixel/Crystal-Think-V2-GGUF/resolve/main/crystal-think-v2-q5_k_m.gguf
# Run with llama.cpp
./llama.cpp/main -m crystal-think-v2-q5_k_m.gguf -p "Solve this step by step: If x + 2y = 10 and 2x - y = 5, find x and y." -n 512
Using llama-cpp-python
from llama_cpp import Llama
# Load the model
llm = Llama(
model_path="crystal-think-v2-q5_k_m.gguf",
n_ctx=4096, # Context length
n_threads=8, # CPU threads
verbose=False
)
# Mathematical reasoning example
prompt = """Solve this step by step:
A rectangle has a length that is 3 more than twice its width. If the perimeter is 42 cm, what are the dimensions?
Use <think></think> for your reasoning."""
response = llm(
prompt,
max_tokens=512,
temperature=0.7,
stop=["</SOLUTION>", "<|endoftext|>"]
)
print(response["choices"][0]["text"])
Using Ollama
# Create Modelfile
echo 'FROM ./crystal-think-v2-q5_k_m.gguf' > Modelfile
# Create Ollama model
ollama create crystal-think-v2 -f Modelfile
# Run the model
ollama run crystal-think-v2 "What is the derivative of x^3 + 2x^2 - 5?"
🎯 Enhanced Reasoning Format
Crystal Think V2 uses a structured reasoning approach:
<think>
[Step-by-step reasoning process]
- Variable definitions
- Equation setup
- Mathematical operations
- Verification steps
</think>
<SOLUTION>
[Final organized answer]
1) Specific results
2) Numerical values
3) Units and context
</SOLUTION>
📊 Performance Benchmarks
Original Model Performance
Benchmark | Score | Improvement over Base |
---|---|---|
GSM8K | 85.2% | +8.8% |
MATH | 42.1% | +10.4% |
Algebra | 78.9% | +13.7% |
Geometry | 71.3% | +12.5% |
Code Math | 82.6% | +13.5% |
GGUF Quantization Impact
- Q8_0: ~99% original performance
- Q6_K: ~97% original performance
- Q5_K_M: ~95% original performance
- Q4_K_M: ~92% original performance
💻 Hardware Requirements
Minimum Requirements
Quantization | RAM | VRAM (GPU) | CPU |
---|---|---|---|
Q4_K_M | 6GB | 4GB | 4 cores |
Q5_K_M | 7GB | 5GB | 4 cores |
Q6_K | 8GB | 6GB | 6 cores |
Q8_0 | 10GB | 8GB | 8 cores |
Recommended for Best Performance
- CPU: Modern 8+ core processor
- RAM: 16GB+ system memory
- GPU: 8GB+ VRAM (optional, for GPU acceleration)
🔧 Installation & Dependencies
llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
llama-cpp-python
pip install llama-cpp-python
# For GPU support (optional)
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
Ollama
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
📚 Usage Examples
Basic Mathematical Problem
Input: "What is the integral of 2x + 3?"
Expected: Step-by-step integration with explanation
Complex Word Problem
Input: "A train travels 120 miles in 2 hours, then 180 miles in 3 hours. What's the average speed?"
Expected: Detailed solution with calculations
Algebraic Reasoning
Input: "Solve the system: 3x + 2y = 12, x - y = 1"
Expected: Systematic solution using substitution or elimination
🔗 Related Links
- 🏠 Original Model: PinkPixel/Crystal-Think-V2
- 📖 Model Documentation: Crystal Think V2 README
- 🛠️ llama.cpp: GitHub Repository
- 🐍 llama-cpp-python: PyPI Package
⚠️ Limitations
- Domain Focus: Optimized for mathematical reasoning; may be less effective for general conversation
- Quantization Trade-offs: Lower quantizations may show reduced accuracy on complex problems
- Language: Primarily trained on English mathematical content
- Hardware Dependency: Performance varies significantly with hardware specifications
📈 Benchmarking Your Setup
Test your quantization choice with this sample problem:
Prompt: "A rectangular garden has a length that is 4 meters more than twice its width. The garden is surrounded by a walkway that is 2 meters wide on all sides. If the total area (garden + walkway) is 294 square meters, find the dimensions of the garden."
Expected: The model should show step-by-step reasoning and arrive at width ≈ 8.13m, length ≈ 20.26m
🤝 Contributing
Found an issue with the quantizations or have suggestions for improvements? Please open an issue or reach out!
📧 Contact & Support
- Developer: Pink Pixel
- GitHub: https://github.com/pinkpixel-dev
- Website: https://pinkpixel.dev
- Email: [email protected]
🙏 Acknowledgments
- Original Model: Crystal Think V2 by Pink Pixel
- Base Model: Qwen/Qwen3-4B by Qwen Team
- Quantization Tools: llama.cpp by Georgi Gerganov
- Training Dataset: NVIDIA OpenMathReasoning
Made with ❤️ by Pink Pixel ✨
"Dream it, Pixel it"
- Downloads last month
- 16
Hardware compatibility
Log In
to view the estimation
4-bit
5-bit
6-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support