π©Ί MediMaven Llama-3.1-8B β AWQ 4-bit (v1.1)
Drop-in 4-bit AWQ quantisation of the MediMaven fp16 weights β fits on a 16 GB GPU (e.g. T4).
π‘ Why use this repo?
Footprint | β 5.9 GB on disk / VRAM |
Throughput | ~29 tok/s on a single T4 (batch = 1) |
Accuracy loss | < 0.3 ROUGE vs fp16 |
β‘ Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("dranreb1660/medimaven-llama3-8b-awq")
model = AutoModelForCausalLM.from_pretrained(
"dranreb1660/medimaven-llama3-8b-awq",
device_map="auto",
torch_dtype="auto" # bitsandbytes picks int4 automatically
)
π§ Quantisation details
AWQ
group_size=128
,zero_point=True
,zero_sym=True
.Calibrated on 128 in-domain prompts (medical Q&A).
Exported with AutoAWQ v0.2.3.
π Usage notes
The model inherits all limitations and licensing terms of the fp16 weights.
For maximum accuracy in secondary fine-tuning, use the fp16 repo instead.
β¬οΈ Versioning
- v1.1 = first public release (merged weights, new tokenizer template).
π Citation
@misc{medimaven2025llama3,
title = {MediMaven Llama-3.1-8B},
author = {Kyei-Mensah, Bernard},
year = {2025},
howpublished = {\url{https://huggingface.co/dranreb1660/medimaven-llama3-8b-fp16}}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for dranreb1660/medimaven-llama3-8b-awq
Base model
meta-llama/Meta-Llama-3-8B
Finetuned
dranreb1660/medimaven-llama3-8b-fp16