metadata
license: apache-2.0
language:
- en
- zh
- es
base_model:
- shuttleai/shuttle-3.5
tags:
- Qwen
- Shuttle
- GGUF
- 32b
- quantized
- Q8_0
Shuttle 3.5 β Q8_0 GGUF Quant
This repo contains a GGUF quantized version of ShuttleAI's Shuttle 3.5 model, a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality.
π Base Model
- Original: shuttleai/shuttle-3.5
- Parent architecture: Qwen 3 32B
- Quantized by: Lex-au
- Quantization format: GGUF Q8_0
π¦ Model Size
Format | Size |
---|---|
Original (safetensors, F16) | 65.52 GB |
Q8_0 (GGUF) | 34.8 GB |
Compression Ratio: ~47%
Size Reduction: ~18% absolute (30.72 GB saved)
π§ͺ Quality
- Q8_0 is near-lossless, preserving almost all performance of the full-precision model.
- Ideal for high-quality inference on capable consumer hardware.
π Usage
Compatible with all major GGUF-supporting runtimes, including:
llama.cpp
KoboldCPP
text-generation-webui
llamafile
LM Studio
Example with llama.cpp
:
./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English."