google/gemma-2b-it - W4A16 Compression

This is a compressed model using llmcompressor.

Compression Configuration

  • Base Model: google/gemma-2b-it
  • Compression Scheme: W4A16
  • Dataset: HuggingFaceH4/ultrachat_200k
  • Dataset Split: train_sft
  • Number of Samples: 512
  • Preprocessor: chat
  • Maximum Sequence Length: 8192

Sample Output

Prompt:

<bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>

Output:

<bos><bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>
Alan Turing was a British mathematician and computer scientist who made significant contributions to the fields of mathematics, computer science, and physics. He is considered one of the pioneers of computer science and a major figure in the history of artificial intelligence.

**Key Contributions:**

* **Coined the term "computer science"**: Turing was one of the first to use the term in a modern sense to refer to the study of the theoretical and practical principles of computation.
* **Developed the Turing machine**: This is a theoretical model of computation that is considered to be the most powerful known model of computation.
* **Pioneered research in artificial

Evaluation

Downloads last month
65
Safetensors
Model size
1.31B params
Tensor type
I64
·
I32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for espressor/google.gemma-2b-it_W4A16

Base model

google/gemma-2b-it
Quantized
(31)
this model

Dataset used to train espressor/google.gemma-2b-it_W4A16