This model is fine-tuned on meta-llama/Llama-2-7b-chat-hf using MedQuAD (Medical Question Answering Dataset).
If you are interested how to fine-tune Llama-2 or other LLM models, the repo will tell you.
Usage
base_model = "meta-llama/Llama-2-7b-chat-hf"
adapter = 'EdwardYu/llama-2-7b-MedQuAD'
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
),
)
model = PeftModel.from_pretrained(model, adapter)
question = 'What are the side effects or risks of Glucagon?'
inputs = tokenizer(question, return_tensors="pt").to("cuda")
outputs = model.generate(inputs=inputs.input_ids, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
To run model inference faster, you can load in 16-bits without 4-bit quantization.
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
- Downloads last month
- 23
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for EdwardYu/llama-2-7b-MedQuAD
Base model
meta-llama/Llama-2-7b-chat-hf