surya-ravindra's picture
Add README.md
8333890 verified

BioGPT INT8 Quantized for Medical Feature Extraction

This is an INT8 quantized version of Microsoft's BioGPT for CPU inference.

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model and apply quantization
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16)
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
model.eval()

# Use for inference
prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

  • Base: microsoft/biogpt
  • Quantization: INT8 dynamic
  • Size: ~85MB (vs 1.56GB original)
  • Optimized for: CPU inference