# BioGPT INT8 Quantized for Medical Feature Extraction This is an INT8 quantized version of Microsoft's BioGPT for CPU inference. ## Quick Start ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM # Load base model and apply quantization tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt") model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16) model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) model.eval() # Use for inference prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F" inputs = tokenizer.encode(prompt, return_tensors="pt") outputs = model.generate(inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Model Details - Base: microsoft/biogpt - Quantization: INT8 dynamic - Size: ~85MB (vs 1.56GB original) - Optimized for: CPU inference