geninhu
/

RakutenAI-7B-instruct-GPTQ

4-bit precision

Model card Files Files and versions Community

RakutenAI-7B-instruct-GPTQ / README.md

geninhu's picture

Upload of AutoGPTQ quantized model

1fd946a verified 8 days ago

|

history blame contribute delete

852 Bytes

	---
	base_model: Rakuten/RakutenAI-7B-instruct
	inference: false
	language:
	- en
	- ja
	license: apache-2.0
	model_creator: Rakuten
	model_type: llama
	quantized_by: auto-gptq
	tags:
	- gptq
	- 4bit
	- vllm
	- quantized
	---

	# RakutenAI-7B-instruct GPTQ

	This is a 4-bit GPTQ quantized version of [Rakuten/RakutenAI-7B-instruct](https://huggingface.co/Rakuten/RakutenAI-7B-instruct).

	## Quantization Details
	- Method: GPTQ
	- Bits: 4
	- Group size: 128
	- Symmetric: True

	## Usage with vLLM
	```python
	from vllm import LLM

	llm = LLM(model="geninhu/RakutenAI-7B-instruct-GPTQ")
	```

	## Usage with Transformers
	```python
	from auto_gptq import AutoGPTQForCausalLM
	from transformers import AutoTokenizer

	model = AutoGPTQForCausalLM.from_quantized("geninhu/RakutenAI-7B-instruct-GPTQ")
	tokenizer = AutoTokenizer.from_pretrained("geninhu/RakutenAI-7B-instruct-GPTQ")
	```