README.md · shisa-ai/shisa-v2-llama3.1-405b-GGUF at 4471eb660f9dd257014d0ea94f9864851003a6f8

metadata

base_model: shisa-ai/shisa-v2-llama3.1-405b
datasets:
  - shisa-ai/shisa-v2-sharegpt
  - shisa-ai/deepseekv3-ultrafeedback-armorm-dpo
language:
  - ja
  - en
  - ko
  - zh
library_name: transformers
license: llama3.1
model_name: shisa-v2-llama3.1-405b
quantized_by: leonardlin

About

This repo contains select GGUF quants of shisa-ai/shisa-v2-llama3.1-405b

All quants were created with b5503 of upstream llama.cpp
All quants are weighted/imatrix quants created from our shisa-ai/shisa-v2-sharegpt bilingual dataset on the fp16 model except for the Q8_0
Files are pre-split at 45GB (below HF's 50GB upload limit). Modern llama.cpp builds should be able to load the sequential files automatically, but you can use llama-gguf-split --merge if you want to merge them back together

Provided Quants

Type	Size (GB)
IQ2_XXS	155
IQ3_XS	155
IQ3_M	170
IQ4_XS	202
Q4_K_M	227
Q8_0	402

Graph by ikawrakow comparing some lower-quality quant PPL (lower is better) - via mradermacher:

Making Quants

# first you need an fp16 - setup llama.cpp python env and run something like
python convert_hf_to_gguf.py ~/.cache/huggingface/hub/models--shisa-ai--shisa-v2-llama3.1-405b/snapshots/71b83a7cb998c3a44f59c83a9928596ac348b9b5 --outfile shisa-v2-llama3.1-405b-fp16.gguf

# Create imatrix: using 4 x H200 you can load 88 layers, takes about 1h15m
CUDA_VISIBLE_DEVICES=4,5,6,7 build/bin/llama-imatrix -m shisa-v2-llama3.1-405b-fp16.gguf -f /data/quantize/shisa-v2-llama-3.1-405b/gguf/calibration_chat.txt -o imatrix.dat -c 512 -b 512 --chunks 100 -ngl 88

# create your imatrix quants
build/bin/llama-quantize --imatrix imatrix.dat shisa-v2-llama3.1-405b-fp16.gguf shisa-v2-llama3.1-405b-IQ3_XS.gguf IQ3_XS

# split the quants
build/bin/llama-gguf-split --split-max-size 45G shisa-v2-llama3.1-405b-IQ3_XS.gguf  shisa-v2-llama3.1-405b-IQ3_XS

# upload (bash loop)
for f in shisa-v2-llama3.1-405b-IQ3_XS-0000*; do huggingface-cli upload shisa-ai/shisa-v2-llama3.1-405b-GGUF "$f"; done