|
# BioGPT INT8 Quantized for Medical Feature Extraction |
|
|
|
This is an INT8 quantized version of Microsoft's BioGPT for CPU inference. |
|
|
|
## Quick Start |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
# Load base model and apply quantization |
|
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt") |
|
model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16) |
|
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8) |
|
model.eval() |
|
|
|
# Use for inference |
|
prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F" |
|
inputs = tokenizer.encode(prompt, return_tensors="pt") |
|
outputs = model.generate(inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Model Details |
|
- Base: microsoft/biogpt |
|
- Quantization: INT8 dynamic |
|
- Size: ~85MB (vs 1.56GB original) |
|
- Optimized for: CPU inference |
|
|