Aakash-Tripathi commited on
Commit
e49852d
Β·
verified Β·
1 Parent(s): 8cf978d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -54
README.md CHANGED
@@ -21,7 +21,7 @@ pipeline_tag: image-classification
21
 
22
  # Sybil - Lung Cancer Risk Prediction
23
 
24
- ## Model Description
25
 
26
  Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.
27
 
@@ -31,46 +31,118 @@ Sybil is a validated deep learning model that predicts future lung cancer risk f
31
  - **Validated Performance**: Tested across multiple institutions globally
32
  - **Ensemble Approach**: Uses 5 models for robust predictions
33
 
34
- ## Model Details
35
 
36
- - **Developed by**: MIT CSAIL & Mass General Cancer Center (Original)
37
- - **Adapted by**: Lab-Rasool (Hugging Face version)
38
- - **Model type**: 3D Convolutional Neural Network
39
- - **Architecture**: 3D ResNet-18 with multi-attention pooling
40
- - **Input**: LDCT scans (200 slices Γ— 256Γ—256 pixels)
41
- - **Output**: 6 risk scores (years 1-6)
42
- - **License**: MIT
43
 
44
- ## Performance Metrics
45
-
46
- | Dataset | 1-Year AUC | 6-Year AUC |
47
- |---------|------------|------------|
48
- | NLST Test | 0.94 | 0.86 |
49
- | MGH | 0.86 | 0.75 |
50
- | CGMH Taiwan | 0.94 | 0.80 |
51
 
52
- ## Usage
53
 
54
  ```python
55
- from huggingface_sybil import SybilHFWrapper, SybilConfig
 
56
 
57
- # Load model
 
 
 
 
 
 
 
 
58
  config = SybilConfig()
59
- model = SybilHFWrapper.from_pretrained("Lab-Rasool/sybil")
60
 
61
- # Prepare DICOM files
62
- dicom_paths = ["scan1.dcm", "scan2.dcm", ...]
63
 
64
  # Get predictions
65
  output = model(dicom_paths=dicom_paths)
66
- risk_scores = output.risk_scores
67
 
68
  # Display results
69
- for year, score in enumerate(risk_scores, 1):
70
- print(f"Year {year}: {score:.1%} risk")
 
71
  ```
72
 
73
- ## Intended Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ### Primary Use Cases
76
  - Risk stratification in lung cancer screening programs
@@ -83,40 +155,43 @@ for year, score in enumerate(risk_scores, 1):
83
  - Screening program coordinators
84
 
85
  ### Out of Scope
86
- - Diagnosis of existing cancer
87
- - Use with non-LDCT imaging (X-rays, MRI)
88
- - Sole basis for clinical decisions
 
89
 
90
- ## Training Data
91
 
92
- Trained on the National Lung Screening Trial (NLST) dataset:
93
- - ~50,000 participants
94
- - Ages 55-74
95
- - Current/former heavy smokers
96
- - 3 annual LDCT scans
 
97
 
98
- ## Ethical Considerations
99
 
100
- ⚠️ **Medical AI Notice**: This model should supplement, not replace, clinical judgment. Always consider:
101
- - Complete patient history
102
- - Other risk factors
103
- - Current screening guidelines
104
- - Need for human oversight
 
105
 
106
- ## Limitations
 
 
 
 
107
 
108
- - Optimized for screening-eligible population (55-80 years)
109
- - Requires LDCT scans specifically
110
- - Performance may vary across different CT scanners
111
- - Not validated for non-screening populations
112
 
113
- ## Citation
114
 
115
- **Original Paper:**
116
  ```bibtex
117
  @article{mikhael2023sybil,
118
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
119
- author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and Karstens, Ludvig and Xiang, Justin and Takigami, Angelo K and Bourgouin, Patrick P and Chan, PuiYee and Mrah, Sofiane and Amayri, Wael and others},
120
  journal={Journal of Clinical Oncology},
121
  volume={41},
122
  number={12},
@@ -126,11 +201,60 @@ Trained on the National Lung Screening Trial (NLST) dataset:
126
  }
127
  ```
128
 
129
- ## Acknowledgments
 
 
 
 
 
 
 
 
 
 
 
 
 
130
 
131
- This Hugging Face implementation is based on the original work by Peter G. Mikhael, Jeremy Wohlwend, and the team at MIT CSAIL and Massachusetts General Hospital. Original model and code available at [GitHub](https://github.com/reginabarzilaygroup/Sybil).
132
 
133
- ## Model Card Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
- For questions about this Hugging Face implementation: Lab-Rasool
136
- For questions about the original model: See the [original repository](https://github.com/reginabarzilaygroup/Sybil)
 
21
 
22
  # Sybil - Lung Cancer Risk Prediction
23
 
24
+ ## 🎯 Model Description
25
 
26
  Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.
27
 
 
31
  - **Validated Performance**: Tested across multiple institutions globally
32
  - **Ensemble Approach**: Uses 5 models for robust predictions
33
 
34
+ ## πŸš€ Quick Start
35
 
36
+ ### Installation
 
 
 
 
 
 
37
 
38
+ ```bash
39
+ pip install huggingface-hub torch torchvision pydicom sybil
40
+ ```
 
 
 
 
41
 
42
+ ### Basic Usage
43
 
44
  ```python
45
+ from huggingface_hub import snapshot_download
46
+ import sys
47
 
48
+ # Download model
49
+ model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
50
+ sys.path.append(model_path)
51
+
52
+ # Import model
53
+ from modeling_sybil_wrapper import SybilHFWrapper
54
+ from configuration_sybil import SybilConfig
55
+
56
+ # Initialize
57
  config = SybilConfig()
58
+ model = SybilHFWrapper(config)
59
 
60
+ # Prepare your DICOM files (CT scan slices)
61
+ dicom_paths = ["scan1.dcm", "scan2.dcm", ...] # Replace with actual paths
62
 
63
  # Get predictions
64
  output = model(dicom_paths=dicom_paths)
65
+ risk_scores = output.risk_scores.numpy()
66
 
67
  # Display results
68
+ print("Lung Cancer Risk Predictions:")
69
+ for i, score in enumerate(risk_scores):
70
+ print(f"Year {i+1}: {score*100:.1f}%")
71
  ```
72
 
73
+ ## πŸ“Š Example with Demo Data
74
+
75
+ ```python
76
+ import requests
77
+ import zipfile
78
+ from io import BytesIO
79
+ import os
80
+
81
+ # Download demo DICOM files
82
+ def get_demo_data():
83
+ cache_dir = os.path.expanduser("~/.sybil_demo")
84
+ demo_dir = os.path.join(cache_dir, "sybil_demo_data")
85
+
86
+ if not os.path.exists(demo_dir):
87
+ print("Downloading demo data...")
88
+ url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1"
89
+ response = requests.get(url)
90
+
91
+ os.makedirs(cache_dir, exist_ok=True)
92
+ with zipfile.ZipFile(BytesIO(response.content)) as zf:
93
+ zf.extractall(cache_dir)
94
+
95
+ # Find DICOM files
96
+ dicom_files = []
97
+ for root, dirs, files in os.walk(cache_dir):
98
+ for file in files:
99
+ if file.endswith('.dcm'):
100
+ dicom_files.append(os.path.join(root, file))
101
+
102
+ return sorted(dicom_files)
103
+
104
+ # Run demo
105
+ from huggingface_hub import snapshot_download
106
+ import sys
107
+
108
+ # Load model
109
+ model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
110
+ sys.path.append(model_path)
111
+
112
+ from modeling_sybil_wrapper import SybilHFWrapper
113
+ from configuration_sybil import SybilConfig
114
+
115
+ # Initialize and predict
116
+ config = SybilConfig()
117
+ model = SybilHFWrapper(config)
118
+
119
+ dicom_files = get_demo_data()
120
+ output = model(dicom_paths=dicom_files)
121
+
122
+ # Show results
123
+ for i, score in enumerate(output.risk_scores.numpy()):
124
+ print(f"Year {i+1}: {score*100:.1f}% risk")
125
+ ```
126
+
127
+ Expected output for demo data:
128
+ ```
129
+ Year 1: 2.2% risk
130
+ Year 2: 4.5% risk
131
+ Year 3: 7.2% risk
132
+ Year 4: 7.9% risk
133
+ Year 5: 9.6% risk
134
+ Year 6: 13.6% risk
135
+ ```
136
+
137
+ ## πŸ“ˆ Performance Metrics
138
+
139
+ | Dataset | 1-Year AUC | 6-Year AUC | Sample Size |
140
+ |---------|------------|------------|-------------|
141
+ | NLST Test | 0.94 | 0.86 | ~15,000 |
142
+ | MGH | 0.86 | 0.75 | ~12,000 |
143
+ | CGMH Taiwan | 0.94 | 0.80 | ~8,000 |
144
+
145
+ ## πŸ₯ Intended Use
146
 
147
  ### Primary Use Cases
148
  - Risk stratification in lung cancer screening programs
 
155
  - Screening program coordinators
156
 
157
  ### Out of Scope
158
+ - ❌ Diagnosis of existing cancer
159
+ - ❌ Use with non-LDCT imaging (X-rays, MRI)
160
+ - ❌ Sole basis for clinical decisions
161
+ - ❌ Use outside medical supervision
162
 
163
+ ## πŸ“‹ Input Requirements
164
 
165
+ - **Format**: DICOM files from chest CT scan
166
+ - **Type**: Low-dose CT (LDCT)
167
+ - **Orientation**: Axial view
168
+ - **Order**: Anatomically ordered (abdomen β†’ clavicles)
169
+ - **Number of slices**: Typically 100-300 slices
170
+ - **Resolution**: Automatically handled by model
171
 
172
+ ## ⚠️ Important Considerations
173
 
174
+ ### Medical AI Notice
175
+ This model should **supplement, not replace**, clinical judgment. Always consider:
176
+ - Complete patient medical history
177
+ - Additional risk factors (smoking, family history)
178
+ - Current clinical guidelines
179
+ - Need for professional medical oversight
180
 
181
+ ### Limitations
182
+ - Optimized for screening population (ages 55-80)
183
+ - Best performance with LDCT scans
184
+ - Not validated for pediatric use
185
+ - Performance may vary with different scanner manufacturers
186
 
187
+ ## πŸ“š Citation
 
 
 
188
 
189
+ If you use this model, please cite the original paper:
190
 
 
191
  ```bibtex
192
  @article{mikhael2023sybil,
193
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
194
+ author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others},
195
  journal={Journal of Clinical Oncology},
196
  volume={41},
197
  number={12},
 
201
  }
202
  ```
203
 
204
+ ## πŸ™ Acknowledgments
205
+
206
+ This Hugging Face implementation is based on the original work by:
207
+ - **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend
208
+ - **Institutions**: MIT CSAIL & Massachusetts General Hospital
209
+ - **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil)
210
+ - **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345)
211
+
212
+ ## πŸ“„ License
213
+
214
+ MIT License - See [LICENSE](LICENSE) file
215
+
216
+ - Original Model Β© 2022 Peter Mikhael & Jeremy Wohlwend
217
+ - HF Adaptation Β© 2024 Lab-Rasool
218
 
219
+ ## πŸ”§ Troubleshooting
220
 
221
+ ### Common Issues
222
+
223
+ 1. **Import Error**: Make sure to append model path to sys.path
224
+ ```python
225
+ sys.path.append(model_path)
226
+ ```
227
+
228
+ 2. **Missing Dependencies**: Install all requirements
229
+ ```bash
230
+ pip install torch torchvision pydicom sybil huggingface-hub
231
+ ```
232
+
233
+ 3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans
234
+ ```python
235
+ import pydicom
236
+ dcm = pydicom.dcmread("your_file.dcm") # Test single file
237
+ ```
238
+
239
+ 4. **Memory Issues**: Model requires ~4GB GPU memory
240
+ ```python
241
+ import torch
242
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
243
+ ```
244
+
245
+ ## πŸ“¬ Support
246
+
247
+ - **HF Model Issues**: Open issue on this repository
248
+ - **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues)
249
+ - **Medical Questions**: Consult healthcare professionals
250
+
251
+ ## πŸ” Additional Resources
252
+
253
+ - [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil)
254
+ - [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345)
255
+ - [NLST Dataset Information](https://cdas.cancer.gov/nlst/)
256
+ - [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases)
257
+
258
+ ---
259
 
260
+ **Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions.