base_model: Rakuten/RakutenAI-7B-instruct | |
inference: false | |
language: | |
- en | |
- ja | |
license: apache-2.0 | |
model_creator: Rakuten | |
model_type: llama | |
quantized_by: auto-gptq | |
tags: | |
- gptq | |
- 4bit | |
- vllm | |
- quantized | |
# RakutenAI-7B-instruct GPTQ | |
This is a 4-bit GPTQ quantized version of [Rakuten/RakutenAI-7B-instruct](https://huggingface.co/Rakuten/RakutenAI-7B-instruct). | |
## Quantization Details | |
- Method: GPTQ | |
- Bits: 4 | |
- Group size: 128 | |
- Symmetric: True | |
## Usage with vLLM | |
```python | |
from vllm import LLM | |
llm = LLM(model="geninhu/RakutenAI-7B-instruct-GPTQ") | |
``` | |
## Usage with Transformers | |
```python | |
from auto_gptq import AutoGPTQForCausalLM | |
from transformers import AutoTokenizer | |
model = AutoGPTQForCausalLM.from_quantized("geninhu/RakutenAI-7B-instruct-GPTQ") | |
tokenizer = AutoTokenizer.from_pretrained("geninhu/RakutenAI-7B-instruct-GPTQ") | |
``` | |