|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- zh |
|
- es |
|
base_model: |
|
- shuttleai/shuttle-3.5 |
|
tags: |
|
- Qwen |
|
- Shuttle |
|
- GGUF |
|
- 32b |
|
- quantized |
|
- Q8_0 |
|
--- |
|
|
|
# Shuttle 3.5 β Q8_0 GGUF Quant |
|
|
|
This repo contains a GGUF quantized version of [ShuttleAI's Shuttle 3.5 model](https://huggingface.co/shuttleai/shuttle-3.5), a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality. |
|
|
|
## π Base Model |
|
|
|
- **Original**: [shuttleai/shuttle-3.5](https://huggingface.co/shuttleai/shuttle-3.5) |
|
- **Parent architecture**: Qwen 3 32B |
|
- **Quantized by**: Lex-au |
|
- **Quantization format**: GGUF Q8_0 |
|
|
|
## π¦ Model Size |
|
|
|
| Format | Size | |
|
|----------|----------| |
|
| Original (safetensors, F16) | 65.52 GB | |
|
| Q8_0 (GGUF) | 34.8 GB | |
|
|
|
**Compression Ratio**: ~47% |
|
**Size Reduction**: ~18% absolute (30.72 GB saved) |
|
|
|
## π§ͺ Quality |
|
|
|
- Q8_0 is **near-lossless**, preserving almost all performance of the full-precision model. |
|
- Ideal for high-quality inference on capable consumer hardware. |
|
|
|
## π Usage |
|
|
|
Compatible with all major GGUF-supporting runtimes, including: |
|
|
|
- `llama.cpp` |
|
- `KoboldCPP` |
|
- `text-generation-webui` |
|
- `llamafile` |
|
- `LM Studio` |
|
|
|
Example with `llama.cpp`: |
|
|
|
```bash |
|
./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English." |
|
|