lex-au
/

shuttle-3.5-Q8_0-GGUF

Model card Files Files and versions Community

shuttle-3.5-Q8_0-GGUF / README.md

lex-au's picture

Create README.md

ae7e231 verified 15 days ago

|

history blame contribute delete

1.4 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	- es
	base_model:
	- shuttleai/shuttle-3.5
	tags:
	- Qwen
	- Shuttle
	- GGUF
	- 32b
	- quantized
	- Q8_0
	---

	# Shuttle 3.5 — Q8_0 GGUF Quant

	This repo contains a GGUF quantized version of [ShuttleAI's Shuttle 3.5 model](https://huggingface.co/shuttleai/shuttle-3.5), a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality.

	## 🔗 Base Model

	- Original: [shuttleai/shuttle-3.5](https://huggingface.co/shuttleai/shuttle-3.5)
	- Parent architecture: Qwen 3 32B
	- Quantized by: Lex-au
	- Quantization format: GGUF Q8_0

	## 📦 Model Size

	\| Format \| Size \|
	\|----------\|----------\|
	\| Original (safetensors, F16) \| 65.52 GB \|
	\| Q8_0 (GGUF) \| 34.8 GB \|

	Compression Ratio: ~47%
	Size Reduction: ~18% absolute (30.72 GB saved)

	## 🧪 Quality

	- Q8_0 is near-lossless, preserving almost all performance of the full-precision model.
	- Ideal for high-quality inference on capable consumer hardware.

	## 🚀 Usage

	Compatible with all major GGUF-supporting runtimes, including:

	- `llama.cpp`
	- `KoboldCPP`
	- `text-generation-webui`
	- `llamafile`
	- `LM Studio`

	Example with `llama.cpp`:

	```bash
	./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English."