File size: 1,399 Bytes
ae7e231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: apache-2.0
language:
- en
- zh
- es
base_model:
- shuttleai/shuttle-3.5
tags:
- Qwen
- Shuttle
- GGUF
- 32b
- quantized
- Q8_0
---

# Shuttle 3.5 β€” Q8_0 GGUF Quant

This repo contains a GGUF quantized version of [ShuttleAI's Shuttle 3.5 model](https://huggingface.co/shuttleai/shuttle-3.5), a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality.

## πŸ”— Base Model

- **Original**: [shuttleai/shuttle-3.5](https://huggingface.co/shuttleai/shuttle-3.5)
- **Parent architecture**: Qwen 3 32B
- **Quantized by**: Lex-au
- **Quantization format**: GGUF Q8_0

## πŸ“¦ Model Size

| Format   | Size     |
|----------|----------|
| Original (safetensors, F16) | 65.52 GB |
| Q8_0 (GGUF)                 | 34.8 GB  |

**Compression Ratio**: ~47%  
**Size Reduction**: ~18% absolute (30.72 GB saved)

## πŸ§ͺ Quality

- Q8_0 is **near-lossless**, preserving almost all performance of the full-precision model.
- Ideal for high-quality inference on capable consumer hardware.

## πŸš€ Usage

Compatible with all major GGUF-supporting runtimes, including:

- `llama.cpp`
- `KoboldCPP`
- `text-generation-webui`
- `llamafile`
- `LM Studio`

Example with `llama.cpp`:

```bash
./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English."