YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BioGPT INT8 Quantized for Medical Feature Extraction

This is an INT8 quantized version of Microsoft's BioGPT for CPU inference.

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load base model and apply quantization
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16)
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
model.eval()

# Use for inference
prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Details

  • Base: microsoft/biogpt
  • Quantization: INT8 dynamic
  • Size: ~85MB (vs 1.56GB original)
  • Optimized for: CPU inference
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support