Doctor-Shotgun's picture
Update README.md
a7d176d
metadata
inference: false
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - llama-2
license: agpl-3.0

CalliopeDS-L2-13B-exl2

Exllama v2 quant of Doctor-Shotgun/CalliopeDS-L2-13B

Branches:

  • main: 4 decoder bits per weight, 6 head bits
    • ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
  • 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
    • ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
  • 8bit-32g-h8: all tensors 8bit 32g, 8 head bits
    • experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
    • similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU
  • maxbpw-h8: ???bpw, 8 head bits
    • experimental quant, this is the maximum optimized mixed quant size that the current version of exllamav2 produces
    • somewhat larger than 6.0bpw but not as large as 8bit, recommend 24gb GPU