lex-au's picture
Create README.md
ae7e231 verified
metadata
license: apache-2.0
language:
  - en
  - zh
  - es
base_model:
  - shuttleai/shuttle-3.5
tags:
  - Qwen
  - Shuttle
  - GGUF
  - 32b
  - quantized
  - Q8_0

Shuttle 3.5 β€” Q8_0 GGUF Quant

This repo contains a GGUF quantized version of ShuttleAI's Shuttle 3.5 model, a high-performance instruction-tuned variant of Qwen 3 32B. This quant was built for efficient local inference without sacrificing quality.

πŸ”— Base Model

  • Original: shuttleai/shuttle-3.5
  • Parent architecture: Qwen 3 32B
  • Quantized by: Lex-au
  • Quantization format: GGUF Q8_0

πŸ“¦ Model Size

Format Size
Original (safetensors, F16) 65.52 GB
Q8_0 (GGUF) 34.8 GB

Compression Ratio: ~47%
Size Reduction: ~18% absolute (30.72 GB saved)

πŸ§ͺ Quality

  • Q8_0 is near-lossless, preserving almost all performance of the full-precision model.
  • Ideal for high-quality inference on capable consumer hardware.

πŸš€ Usage

Compatible with all major GGUF-supporting runtimes, including:

  • llama.cpp
  • KoboldCPP
  • text-generation-webui
  • llamafile
  • LM Studio

Example with llama.cpp:

./main -m shuttle-3.5.Q8_0.gguf --ctx-size 4096 --threads 16 --prompt "Describe the effects of quantum decoherence in plain English."