|
--- |
|
license: mit |
|
tags: |
|
- medical |
|
- cancer |
|
- ct-scan |
|
- risk-prediction |
|
- healthcare |
|
- pytorch |
|
- vision |
|
datasets: |
|
- NLST |
|
metrics: |
|
- auc |
|
- c-index |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
# Sybil - Lung Cancer Risk Prediction |
|
|
|
## π― Model Description |
|
|
|
Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe. |
|
|
|
### Key Features |
|
- **Single Scan Analysis**: Requires only one LDCT scan |
|
- **Multi-Year Prediction**: Provides risk scores for years 1-6 |
|
- **Validated Performance**: Tested across multiple institutions globally |
|
- **Ensemble Approach**: Uses 5 models for robust predictions |
|
|
|
## π Quick Start |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install huggingface-hub torch torchvision pydicom |
|
``` |
|
|
|
### Basic Usage |
|
|
|
```python |
|
from huggingface_hub import snapshot_download |
|
import sys |
|
|
|
# Download model |
|
model_path = snapshot_download(repo_id="Lab-Rasool/sybil") |
|
sys.path.append(model_path) |
|
|
|
# Import model |
|
from modeling_sybil_wrapper import SybilHFWrapper |
|
from configuration_sybil import SybilConfig |
|
|
|
# Initialize |
|
config = SybilConfig() |
|
model = SybilHFWrapper(config) |
|
|
|
# Prepare your DICOM files (CT scan slices) |
|
dicom_paths = ["scan1.dcm", "scan2.dcm", ...] # Replace with actual paths |
|
|
|
# Get predictions |
|
output = model(dicom_paths=dicom_paths) |
|
risk_scores = output.risk_scores.numpy() |
|
|
|
# Display results |
|
print("Lung Cancer Risk Predictions:") |
|
for i, score in enumerate(risk_scores): |
|
print(f"Year {i+1}: {score*100:.1f}%") |
|
``` |
|
|
|
## π Example with Demo Data |
|
|
|
```python |
|
import requests |
|
import zipfile |
|
from io import BytesIO |
|
import os |
|
|
|
# Download demo DICOM files |
|
def get_demo_data(): |
|
cache_dir = os.path.expanduser("~/.sybil_demo") |
|
demo_dir = os.path.join(cache_dir, "sybil_demo_data") |
|
|
|
if not os.path.exists(demo_dir): |
|
print("Downloading demo data...") |
|
url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1" |
|
response = requests.get(url) |
|
|
|
os.makedirs(cache_dir, exist_ok=True) |
|
with zipfile.ZipFile(BytesIO(response.content)) as zf: |
|
zf.extractall(cache_dir) |
|
|
|
# Find DICOM files |
|
dicom_files = [] |
|
for root, dirs, files in os.walk(cache_dir): |
|
for file in files: |
|
if file.endswith('.dcm'): |
|
dicom_files.append(os.path.join(root, file)) |
|
|
|
return sorted(dicom_files) |
|
|
|
# Run demo |
|
from huggingface_hub import snapshot_download |
|
import sys |
|
|
|
# Load model |
|
model_path = snapshot_download(repo_id="Lab-Rasool/sybil") |
|
sys.path.append(model_path) |
|
|
|
from modeling_sybil_wrapper import SybilHFWrapper |
|
from configuration_sybil import SybilConfig |
|
|
|
# Initialize and predict |
|
config = SybilConfig() |
|
model = SybilHFWrapper(config) |
|
|
|
dicom_files = get_demo_data() |
|
output = model(dicom_paths=dicom_files) |
|
|
|
# Show results |
|
for i, score in enumerate(output.risk_scores.numpy()): |
|
print(f"Year {i+1}: {score*100:.1f}% risk") |
|
``` |
|
|
|
Expected output for demo data: |
|
``` |
|
Year 1: 2.2% risk |
|
Year 2: 4.5% risk |
|
Year 3: 7.2% risk |
|
Year 4: 7.9% risk |
|
Year 5: 9.6% risk |
|
Year 6: 13.6% risk |
|
``` |
|
|
|
## π Performance Metrics |
|
|
|
| Dataset | 1-Year AUC | 6-Year AUC | Sample Size | |
|
|---------|------------|------------|-------------| |
|
| NLST Test | 0.94 | 0.86 | ~15,000 | |
|
| MGH | 0.86 | 0.75 | ~12,000 | |
|
| CGMH Taiwan | 0.94 | 0.80 | ~8,000 | |
|
|
|
## π₯ Intended Use |
|
|
|
### Primary Use Cases |
|
- Risk stratification in lung cancer screening programs |
|
- Research on lung cancer prediction models |
|
- Clinical decision support (with appropriate oversight) |
|
|
|
### Users |
|
- Healthcare providers |
|
- Medical researchers |
|
- Screening program coordinators |
|
|
|
### Out of Scope |
|
- β Diagnosis of existing cancer |
|
- β Use with non-LDCT imaging (X-rays, MRI) |
|
- β Sole basis for clinical decisions |
|
- β Use outside medical supervision |
|
|
|
## π Input Requirements |
|
|
|
- **Format**: DICOM files from chest CT scan |
|
- **Type**: Low-dose CT (LDCT) |
|
- **Orientation**: Axial view |
|
- **Order**: Anatomically ordered (abdomen β clavicles) |
|
- **Number of slices**: Typically 100-300 slices |
|
- **Resolution**: Automatically handled by model |
|
|
|
## β οΈ Important Considerations |
|
|
|
### Medical AI Notice |
|
This model should **supplement, not replace**, clinical judgment. Always consider: |
|
- Complete patient medical history |
|
- Additional risk factors (smoking, family history) |
|
- Current clinical guidelines |
|
- Need for professional medical oversight |
|
|
|
### Limitations |
|
- Optimized for screening population (ages 55-80) |
|
- Best performance with LDCT scans |
|
- Not validated for pediatric use |
|
- Performance may vary with different scanner manufacturers |
|
|
|
## π Citation |
|
|
|
If you use this model, please cite the original paper: |
|
|
|
```bibtex |
|
@article{mikhael2023sybil, |
|
title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography}, |
|
author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others}, |
|
journal={Journal of Clinical Oncology}, |
|
volume={41}, |
|
number={12}, |
|
pages={2191--2200}, |
|
year={2023}, |
|
publisher={American Society of Clinical Oncology} |
|
} |
|
``` |
|
|
|
## π Acknowledgments |
|
|
|
This Hugging Face implementation is based on the original work by: |
|
- **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend |
|
- **Institutions**: MIT CSAIL & Massachusetts General Hospital |
|
- **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil) |
|
- **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345) |
|
|
|
## π License |
|
|
|
MIT License - See [LICENSE](LICENSE) file |
|
|
|
- Original Model Β© 2022 Peter Mikhael & Jeremy Wohlwend |
|
- HF Adaptation Β© 2024 Lab-Rasool |
|
|
|
## π§ Troubleshooting |
|
|
|
### Common Issues |
|
|
|
1. **Import Error**: Make sure to append model path to sys.path |
|
```python |
|
sys.path.append(model_path) |
|
``` |
|
|
|
2. **Missing Dependencies**: Install all requirements |
|
```bash |
|
pip install torch torchvision pydicom sybil huggingface-hub |
|
``` |
|
|
|
3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans |
|
```python |
|
import pydicom |
|
dcm = pydicom.dcmread("your_file.dcm") # Test single file |
|
``` |
|
|
|
4. **Memory Issues**: Model requires ~4GB GPU memory |
|
```python |
|
import torch |
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
``` |
|
|
|
## π¬ Support |
|
|
|
- **HF Model Issues**: Open issue on this repository |
|
- **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues) |
|
- **Medical Questions**: Consult healthcare professionals |
|
|
|
## π Additional Resources |
|
|
|
- [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil) |
|
- [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345) |
|
- [NLST Dataset Information](https://cdas.cancer.gov/nlst/) |
|
- [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases) |
|
|
|
--- |
|
|
|
**Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions. |