TheBloke
/

OpenOrcaxOpenChat-Preview2-13B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Aug 3, 2023

Commit

192e7b7

•

1 Parent(s): 727ff49

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -58,6 +58,7 @@ Each separate quant is in a different branch.  See below for instructions on fet
 <details>
   <summary>Explanation of GPTQ parameters</summary>
 - Bits: The bit size of the quantised model.
 - GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
 - Act Order: True or False. Also known as `desc_act`. True results in better quantisation accuracy. Some GPTQ clients have issues with models that use Act Order plus Group Size.
@@ -65,6 +66,7 @@ Each separate quant is in a different branch.  See below for instructions on fet
 - GPTQ dataset: The dataset used for quantisation. The dataset used for quantisation can affect the quantisation accuracy. The dataset used for quantisation is not the same as the dataset used to train the model.
 - Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used.  Note that a lower sequence length does not limit the sequence length of the quantised model. It only affects the quantisation accuracy on longer inference sequences.
 - ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4-bit.
 </details>
 | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama Compat? | By | Desc |

 <details>
   <summary>Explanation of GPTQ parameters</summary>
 - Bits: The bit size of the quantised model.
 - GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
 - Act Order: True or False. Also known as `desc_act`. True results in better quantisation accuracy. Some GPTQ clients have issues with models that use Act Order plus Group Size.
 - GPTQ dataset: The dataset used for quantisation. The dataset used for quantisation can affect the quantisation accuracy. The dataset used for quantisation is not the same as the dataset used to train the model.
 - Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used.  Note that a lower sequence length does not limit the sequence length of the quantised model. It only affects the quantisation accuracy on longer inference sequences.
 - ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4-bit.
 </details>
 | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama Compat? | By | Desc |