Sybil - Lung Cancer Risk Prediction
π― Model Description
Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.
Key Features
- Single Scan Analysis: Requires only one LDCT scan
- Multi-Year Prediction: Provides risk scores for years 1-6
- Validated Performance: Tested across multiple institutions globally
- Ensemble Approach: Uses 5 models for robust predictions
π Quick Start
Installation
pip install huggingface-hub torch torchvision pydicom
Basic Usage
from huggingface_hub import snapshot_download
import sys
# Download model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)
# Import model
from modeling_sybil_wrapper import SybilHFWrapper
from configuration_sybil import SybilConfig
# Initialize
config = SybilConfig()
model = SybilHFWrapper(config)
# Prepare your DICOM files (CT scan slices)
dicom_paths = ["scan1.dcm", "scan2.dcm", ...] # Replace with actual paths
# Get predictions
output = model(dicom_paths=dicom_paths)
risk_scores = output.risk_scores.numpy()
# Display results
print("Lung Cancer Risk Predictions:")
for i, score in enumerate(risk_scores):
print(f"Year {i+1}: {score*100:.1f}%")
π Example with Demo Data
import requests
import zipfile
from io import BytesIO
import os
# Download demo DICOM files
def get_demo_data():
cache_dir = os.path.expanduser("~/.sybil_demo")
demo_dir = os.path.join(cache_dir, "sybil_demo_data")
if not os.path.exists(demo_dir):
print("Downloading demo data...")
url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1"
response = requests.get(url)
os.makedirs(cache_dir, exist_ok=True)
with zipfile.ZipFile(BytesIO(response.content)) as zf:
zf.extractall(cache_dir)
# Find DICOM files
dicom_files = []
for root, dirs, files in os.walk(cache_dir):
for file in files:
if file.endswith('.dcm'):
dicom_files.append(os.path.join(root, file))
return sorted(dicom_files)
# Run demo
from huggingface_hub import snapshot_download
import sys
# Load model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)
from modeling_sybil_wrapper import SybilHFWrapper
from configuration_sybil import SybilConfig
# Initialize and predict
config = SybilConfig()
model = SybilHFWrapper(config)
dicom_files = get_demo_data()
output = model(dicom_paths=dicom_files)
# Show results
for i, score in enumerate(output.risk_scores.numpy()):
print(f"Year {i+1}: {score*100:.1f}% risk")
Expected output for demo data:
Year 1: 2.2% risk
Year 2: 4.5% risk
Year 3: 7.2% risk
Year 4: 7.9% risk
Year 5: 9.6% risk
Year 6: 13.6% risk
π Performance Metrics
Dataset | 1-Year AUC | 6-Year AUC | Sample Size |
---|---|---|---|
NLST Test | 0.94 | 0.86 | ~15,000 |
MGH | 0.86 | 0.75 | ~12,000 |
CGMH Taiwan | 0.94 | 0.80 | ~8,000 |
π₯ Intended Use
Primary Use Cases
- Risk stratification in lung cancer screening programs
- Research on lung cancer prediction models
- Clinical decision support (with appropriate oversight)
Users
- Healthcare providers
- Medical researchers
- Screening program coordinators
Out of Scope
- β Diagnosis of existing cancer
- β Use with non-LDCT imaging (X-rays, MRI)
- β Sole basis for clinical decisions
- β Use outside medical supervision
π Input Requirements
- Format: DICOM files from chest CT scan
- Type: Low-dose CT (LDCT)
- Orientation: Axial view
- Order: Anatomically ordered (abdomen β clavicles)
- Number of slices: Typically 100-300 slices
- Resolution: Automatically handled by model
β οΈ Important Considerations
Medical AI Notice
This model should supplement, not replace, clinical judgment. Always consider:
- Complete patient medical history
- Additional risk factors (smoking, family history)
- Current clinical guidelines
- Need for professional medical oversight
Limitations
- Optimized for screening population (ages 55-80)
- Best performance with LDCT scans
- Not validated for pediatric use
- Performance may vary with different scanner manufacturers
π Citation
If you use this model, please cite the original paper:
@article{mikhael2023sybil,
title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others},
journal={Journal of Clinical Oncology},
volume={41},
number={12},
pages={2191--2200},
year={2023},
publisher={American Society of Clinical Oncology}
}
π Acknowledgments
This Hugging Face implementation is based on the original work by:
- Original Authors: Peter G. Mikhael & Jeremy Wohlwend
- Institutions: MIT CSAIL & Massachusetts General Hospital
- Original Repository: GitHub
- Paper: Journal of Clinical Oncology
π License
MIT License - See LICENSE file
- Original Model Β© 2022 Peter Mikhael & Jeremy Wohlwend
- HF Adaptation Β© 2024 Lab-Rasool
π§ Troubleshooting
Common Issues
Import Error: Make sure to append model path to sys.path
sys.path.append(model_path)
Missing Dependencies: Install all requirements
pip install torch torchvision pydicom sybil huggingface-hub
DICOM Loading Error: Ensure DICOM files are valid CT scans
import pydicom dcm = pydicom.dcmread("your_file.dcm") # Test single file
Memory Issues: Model requires ~4GB GPU memory
import torch device = 'cuda' if torch.cuda.is_available() else 'cpu'
π¬ Support
- HF Model Issues: Open issue on this repository
- Original Model: GitHub Issues
- Medical Questions: Consult healthcare professionals
π Additional Resources
Note: This is a research model. Always consult qualified healthcare professionals for medical decisions.
- Downloads last month
- 22