surya-ravindra commited on
Commit
8333890
·
verified ·
1 Parent(s): 6c79016

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BioGPT INT8 Quantized for Medical Feature Extraction
2
+
3
+ This is an INT8 quantized version of Microsoft's BioGPT for CPU inference.
4
+
5
+ ## Quick Start
6
+
7
+ ```python
8
+ import torch
9
+ from transformers import AutoTokenizer, AutoModelForCausalLM
10
+
11
+ # Load base model and apply quantization
12
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
13
+ model = AutoModelForCausalLM.from_pretrained("microsoft/biogpt", torch_dtype=torch.float16)
14
+ model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
15
+ model.eval()
16
+
17
+ # Use for inference
18
+ prompt = "Extract medical features: Patient is 45-year-old male with fever 101.2F"
19
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
20
+ outputs = model.generate(inputs, max_new_tokens=100)
21
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
22
+ ```
23
+
24
+ ## Model Details
25
+ - Base: microsoft/biogpt
26
+ - Quantization: INT8 dynamic
27
+ - Size: ~85MB (vs 1.56GB original)
28
+ - Optimized for: CPU inference