File size: 6,997 Bytes
1206896
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e49852d
1206896
 
 
 
 
 
 
 
 
e49852d
1206896
e49852d
1206896
e49852d
320b436
e49852d
1206896
e49852d
1206896
 
e49852d
 
1206896
e49852d
 
 
 
 
 
 
 
 
1206896
e49852d
1206896
e49852d
 
1206896
 
 
e49852d
1206896
 
e49852d
 
 
1206896
 
e49852d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1206896
 
 
 
 
 
 
 
 
 
 
 
e49852d
 
 
 
1206896
e49852d
1206896
e49852d
 
 
 
 
 
1206896
e49852d
1206896
e49852d
 
 
 
 
 
1206896
e49852d
 
 
 
 
1206896
e49852d
1206896
e49852d
1206896
 
 
 
e49852d
1206896
 
 
 
 
 
 
 
 
e49852d
 
 
 
 
 
 
 
 
 
 
 
 
 
1206896
e49852d
1206896
e49852d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1206896
e49852d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
license: mit
tags:
- medical
- cancer
- ct-scan
- risk-prediction
- healthcare
- pytorch
- vision
datasets:
- NLST
metrics:
- auc
- c-index
language:
- en
library_name: transformers
pipeline_tag: image-classification
---

# Sybil - Lung Cancer Risk Prediction

## 🎯 Model Description

Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.

### Key Features
- **Single Scan Analysis**: Requires only one LDCT scan
- **Multi-Year Prediction**: Provides risk scores for years 1-6
- **Validated Performance**: Tested across multiple institutions globally
- **Ensemble Approach**: Uses 5 models for robust predictions

## πŸš€ Quick Start

### Installation

```bash
pip install huggingface-hub torch torchvision pydicom
```

### Basic Usage

```python
from huggingface_hub import snapshot_download
import sys

# Download model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)

# Import model
from modeling_sybil_wrapper import SybilHFWrapper
from configuration_sybil import SybilConfig

# Initialize
config = SybilConfig()
model = SybilHFWrapper(config)

# Prepare your DICOM files (CT scan slices)
dicom_paths = ["scan1.dcm", "scan2.dcm", ...]  # Replace with actual paths

# Get predictions
output = model(dicom_paths=dicom_paths)
risk_scores = output.risk_scores.numpy()

# Display results
print("Lung Cancer Risk Predictions:")
for i, score in enumerate(risk_scores):
    print(f"Year {i+1}: {score*100:.1f}%")
```

## πŸ“Š Example with Demo Data

```python
import requests
import zipfile
from io import BytesIO
import os

# Download demo DICOM files
def get_demo_data():
    cache_dir = os.path.expanduser("~/.sybil_demo")
    demo_dir = os.path.join(cache_dir, "sybil_demo_data")

    if not os.path.exists(demo_dir):
        print("Downloading demo data...")
        url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1"
        response = requests.get(url)

        os.makedirs(cache_dir, exist_ok=True)
        with zipfile.ZipFile(BytesIO(response.content)) as zf:
            zf.extractall(cache_dir)

    # Find DICOM files
    dicom_files = []
    for root, dirs, files in os.walk(cache_dir):
        for file in files:
            if file.endswith('.dcm'):
                dicom_files.append(os.path.join(root, file))

    return sorted(dicom_files)

# Run demo
from huggingface_hub import snapshot_download
import sys

# Load model
model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
sys.path.append(model_path)

from modeling_sybil_wrapper import SybilHFWrapper
from configuration_sybil import SybilConfig

# Initialize and predict
config = SybilConfig()
model = SybilHFWrapper(config)

dicom_files = get_demo_data()
output = model(dicom_paths=dicom_files)

# Show results
for i, score in enumerate(output.risk_scores.numpy()):
    print(f"Year {i+1}: {score*100:.1f}% risk")
```

Expected output for demo data:
```
Year 1: 2.2% risk
Year 2: 4.5% risk
Year 3: 7.2% risk
Year 4: 7.9% risk
Year 5: 9.6% risk
Year 6: 13.6% risk
```

## πŸ“ˆ Performance Metrics

| Dataset | 1-Year AUC | 6-Year AUC | Sample Size |
|---------|------------|------------|-------------|
| NLST Test | 0.94 | 0.86 | ~15,000 |
| MGH | 0.86 | 0.75 | ~12,000 |
| CGMH Taiwan | 0.94 | 0.80 | ~8,000 |

## πŸ₯ Intended Use

### Primary Use Cases
- Risk stratification in lung cancer screening programs
- Research on lung cancer prediction models
- Clinical decision support (with appropriate oversight)

### Users
- Healthcare providers
- Medical researchers
- Screening program coordinators

### Out of Scope
- ❌ Diagnosis of existing cancer
- ❌ Use with non-LDCT imaging (X-rays, MRI)
- ❌ Sole basis for clinical decisions
- ❌ Use outside medical supervision

## πŸ“‹ Input Requirements

- **Format**: DICOM files from chest CT scan
- **Type**: Low-dose CT (LDCT)
- **Orientation**: Axial view
- **Order**: Anatomically ordered (abdomen β†’ clavicles)
- **Number of slices**: Typically 100-300 slices
- **Resolution**: Automatically handled by model

## ⚠️ Important Considerations

### Medical AI Notice
This model should **supplement, not replace**, clinical judgment. Always consider:
- Complete patient medical history
- Additional risk factors (smoking, family history)
- Current clinical guidelines
- Need for professional medical oversight

### Limitations
- Optimized for screening population (ages 55-80)
- Best performance with LDCT scans
- Not validated for pediatric use
- Performance may vary with different scanner manufacturers

## πŸ“š Citation

If you use this model, please cite the original paper:

```bibtex
@article{mikhael2023sybil,
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others},
  journal={Journal of Clinical Oncology},
  volume={41},
  number={12},
  pages={2191--2200},
  year={2023},
  publisher={American Society of Clinical Oncology}
}
```

## πŸ™ Acknowledgments

This Hugging Face implementation is based on the original work by:
- **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend
- **Institutions**: MIT CSAIL & Massachusetts General Hospital
- **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil)
- **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345)

## πŸ“„ License

MIT License - See [LICENSE](LICENSE) file

- Original Model Β© 2022 Peter Mikhael & Jeremy Wohlwend
- HF Adaptation Β© 2024 Lab-Rasool

## πŸ”§ Troubleshooting

### Common Issues

1. **Import Error**: Make sure to append model path to sys.path
   ```python
   sys.path.append(model_path)
   ```

2. **Missing Dependencies**: Install all requirements
   ```bash
   pip install torch torchvision pydicom sybil huggingface-hub
   ```

3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans
   ```python
   import pydicom
   dcm = pydicom.dcmread("your_file.dcm")  # Test single file
   ```

4. **Memory Issues**: Model requires ~4GB GPU memory
   ```python
   import torch
   device = 'cuda' if torch.cuda.is_available() else 'cpu'
   ```

## πŸ“¬ Support

- **HF Model Issues**: Open issue on this repository
- **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues)
- **Medical Questions**: Consult healthcare professionals

## πŸ” Additional Resources

- [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil)
- [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345)
- [NLST Dataset Information](https://cdas.cancer.gov/nlst/)
- [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases)

---

**Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions.