Lab-Rasool
/

sybil

@@ -21,7 +21,7 @@ pipeline_tag: image-classification
 # Sybil - Lung Cancer Risk Prediction
-## Model Description
 Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.
@@ -31,46 +31,118 @@ Sybil is a validated deep learning model that predicts future lung cancer risk f
 - **Validated Performance**: Tested across multiple institutions globally
 - **Ensemble Approach**: Uses 5 models for robust predictions
-## Model Details
-- **Developed by**: MIT CSAIL & Mass General Cancer Center (Original)
-- **Adapted by**: Lab-Rasool (Hugging Face version)
-- **Model type**: 3D Convolutional Neural Network
-- **Architecture**: 3D ResNet-18 with multi-attention pooling
-- **Input**: LDCT scans (200 slices × 256×256 pixels)
-- **Output**: 6 risk scores (years 1-6)
-- **License**: MIT
-## Performance Metrics
-| Dataset | 1-Year AUC | 6-Year AUC |
-|---------|------------|------------|
-| NLST Test | 0.94 | 0.86 |
-| MGH | 0.86 | 0.75 |
-| CGMH Taiwan | 0.94 | 0.80 |
-## Usage
 ```python
-from huggingface_sybil import SybilHFWrapper, SybilConfig
-# Load model
 config = SybilConfig()
-model = SybilHFWrapper.from_pretrained("Lab-Rasool/sybil")
-# Prepare DICOM files
-dicom_paths = ["scan1.dcm", "scan2.dcm", ...]
 # Get predictions
 output = model(dicom_paths=dicom_paths)
-risk_scores = output.risk_scores
 # Display results
-for year, score in enumerate(risk_scores, 1):
-    print(f"Year {year}: {score:.1%} risk")
 ```
-## Intended Use
 ### Primary Use Cases
 - Risk stratification in lung cancer screening programs
@@ -83,40 +155,43 @@ for year, score in enumerate(risk_scores, 1):
 - Screening program coordinators
 ### Out of Scope
-- Diagnosis of existing cancer
-- Use with non-LDCT imaging (X-rays, MRI)
-- Sole basis for clinical decisions
-## Training Data
-Trained on the National Lung Screening Trial (NLST) dataset:
-- ~50,000 participants
-- Ages 55-74
-- Current/former heavy smokers
-- 3 annual LDCT scans
-## Ethical Considerations
-⚠️ **Medical AI Notice**: This model should supplement, not replace, clinical judgment. Always consider:
-- Complete patient history
-- Other risk factors
-- Current screening guidelines
-- Need for human oversight
-## Limitations
-- Optimized for screening-eligible population (55-80 years)
-- Requires LDCT scans specifically
-- Performance may vary across different CT scanners
-- Not validated for non-screening populations
-## Citation
-**Original Paper:**
 ```bibtex
 @article{mikhael2023sybil,
   title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
-  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and Karstens, Ludvig and Xiang, Justin and Takigami, Angelo K and Bourgouin, Patrick P and Chan, PuiYee and Mrah, Sofiane and Amayri, Wael and others},
   journal={Journal of Clinical Oncology},
   volume={41},
   number={12},
@@ -126,11 +201,60 @@ Trained on the National Lung Screening Trial (NLST) dataset:
 }
 ```
-## Acknowledgments
-This Hugging Face implementation is based on the original work by Peter G. Mikhael, Jeremy Wohlwend, and the team at MIT CSAIL and Massachusetts General Hospital. Original model and code available at [GitHub](https://github.com/reginabarzilaygroup/Sybil).
-## Model Card Contact
-For questions about this Hugging Face implementation: Lab-Rasool
-For questions about the original model: See the [original repository](https://github.com/reginabarzilaygroup/Sybil)

 # Sybil - Lung Cancer Risk Prediction
+## 🎯 Model Description
 Sybil is a validated deep learning model that predicts future lung cancer risk from a single low-dose chest CT (LDCT) scan. Published in the Journal of Clinical Oncology, this model can assess cancer risk over a 1-6 year timeframe.
 - **Validated Performance**: Tested across multiple institutions globally
 - **Ensemble Approach**: Uses 5 models for robust predictions
+## 🚀 Quick Start
+### Installation
+```bash
+pip install huggingface-hub torch torchvision pydicom sybil
+```
+### Basic Usage
 ```python
+from huggingface_hub import snapshot_download
+import sys
+# Download model
+model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
+sys.path.append(model_path)
+# Import model
+from modeling_sybil_wrapper import SybilHFWrapper
+from configuration_sybil import SybilConfig
+# Initialize
 config = SybilConfig()
+model = SybilHFWrapper(config)
+# Prepare your DICOM files (CT scan slices)
+dicom_paths = ["scan1.dcm", "scan2.dcm", ...]  # Replace with actual paths
 # Get predictions
 output = model(dicom_paths=dicom_paths)
+risk_scores = output.risk_scores.numpy()
 # Display results
+print("Lung Cancer Risk Predictions:")
+for i, score in enumerate(risk_scores):
+    print(f"Year {i+1}: {score*100:.1f}%")
 ```
+## 📊 Example with Demo Data
+```python
+import requests
+import zipfile
+from io import BytesIO
+import os
+# Download demo DICOM files
+def get_demo_data():
+    cache_dir = os.path.expanduser("~/.sybil_demo")
+    demo_dir = os.path.join(cache_dir, "sybil_demo_data")
+    if not os.path.exists(demo_dir):
+        print("Downloading demo data...")
+        url = "https://www.dropbox.com/scl/fi/covbvo6f547kak4em3cjd/sybil_example.zip?rlkey=7a13nhlc9uwga9x7pmtk1cf1c&dl=1"
+        response = requests.get(url)
+        os.makedirs(cache_dir, exist_ok=True)
+        with zipfile.ZipFile(BytesIO(response.content)) as zf:
+            zf.extractall(cache_dir)
+    # Find DICOM files
+    dicom_files = []
+    for root, dirs, files in os.walk(cache_dir):
+        for file in files:
+            if file.endswith('.dcm'):
+                dicom_files.append(os.path.join(root, file))
+    return sorted(dicom_files)
+# Run demo
+from huggingface_hub import snapshot_download
+import sys
+# Load model
+model_path = snapshot_download(repo_id="Lab-Rasool/sybil")
+sys.path.append(model_path)
+from modeling_sybil_wrapper import SybilHFWrapper
+from configuration_sybil import SybilConfig
+# Initialize and predict
+config = SybilConfig()
+model = SybilHFWrapper(config)
+dicom_files = get_demo_data()
+output = model(dicom_paths=dicom_files)
+# Show results
+for i, score in enumerate(output.risk_scores.numpy()):
+    print(f"Year {i+1}: {score*100:.1f}% risk")
+```
+Expected output for demo data:
+```
+Year 1: 2.2% risk
+Year 2: 4.5% risk
+Year 3: 7.2% risk
+Year 4: 7.9% risk
+Year 5: 9.6% risk
+Year 6: 13.6% risk
+```
+## 📈 Performance Metrics
+| Dataset | 1-Year AUC | 6-Year AUC | Sample Size |
+|---------|------------|------------|-------------|
+| NLST Test | 0.94 | 0.86 | ~15,000 |
+| MGH | 0.86 | 0.75 | ~12,000 |
+| CGMH Taiwan | 0.94 | 0.80 | ~8,000 |
+## 🏥 Intended Use
 ### Primary Use Cases
 - Risk stratification in lung cancer screening programs
 - Screening program coordinators
 ### Out of Scope
+- ❌ Diagnosis of existing cancer
+- ❌ Use with non-LDCT imaging (X-rays, MRI)
+- ❌ Sole basis for clinical decisions
+- ❌ Use outside medical supervision
+## 📋 Input Requirements
+- **Format**: DICOM files from chest CT scan
+- **Type**: Low-dose CT (LDCT)
+- **Orientation**: Axial view
+- **Order**: Anatomically ordered (abdomen → clavicles)
+- **Number of slices**: Typically 100-300 slices
+- **Resolution**: Automatically handled by model
+## ⚠️ Important Considerations
+### Medical AI Notice
+This model should **supplement, not replace**, clinical judgment. Always consider:
+- Complete patient medical history
+- Additional risk factors (smoking, family history)
+- Current clinical guidelines
+- Need for professional medical oversight
+### Limitations
+- Optimized for screening population (ages 55-80)
+- Best performance with LDCT scans
+- Not validated for pediatric use
+- Performance may vary with different scanner manufacturers
+## 📚 Citation
+If you use this model, please cite the original paper:
 ```bibtex
 @article{mikhael2023sybil,
   title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
+  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and others},
   journal={Journal of Clinical Oncology},
   volume={41},
   number={12},
 }
 ```
+## 🙏 Acknowledgments
+This Hugging Face implementation is based on the original work by:
+- **Original Authors**: Peter G. Mikhael & Jeremy Wohlwend
+- **Institutions**: MIT CSAIL & Massachusetts General Hospital
+- **Original Repository**: [GitHub](https://github.com/reginabarzilaygroup/Sybil)
+- **Paper**: [Journal of Clinical Oncology](https://doi.org/10.1200/JCO.22.01345)
+## 📄 License
+MIT License - See [LICENSE](LICENSE) file
+- Original Model © 2022 Peter Mikhael & Jeremy Wohlwend
+- HF Adaptation © 2024 Lab-Rasool
+## 🔧 Troubleshooting
+### Common Issues
+1. **Import Error**: Make sure to append model path to sys.path
+   ```python
+   sys.path.append(model_path)
+   ```
+2. **Missing Dependencies**: Install all requirements
+   ```bash
+   pip install torch torchvision pydicom sybil huggingface-hub
+   ```
+3. **DICOM Loading Error**: Ensure DICOM files are valid CT scans
+   ```python
+   import pydicom
+   dcm = pydicom.dcmread("your_file.dcm")  # Test single file
+   ```
+4. **Memory Issues**: Model requires ~4GB GPU memory
+   ```python
+   import torch
+   device = 'cuda' if torch.cuda.is_available() else 'cpu'
+   ```
+## 📬 Support
+- **HF Model Issues**: Open issue on this repository
+- **Original Model**: [GitHub Issues](https://github.com/reginabarzilaygroup/Sybil/issues)
+- **Medical Questions**: Consult healthcare professionals
+## 🔍 Additional Resources
+- [Original GitHub Repository](https://github.com/reginabarzilaygroup/Sybil)
+- [Paper (Open Access)](https://doi.org/10.1200/JCO.22.01345)
+- [NLST Dataset Information](https://cdas.cancer.gov/nlst/)
+- [Demo Data](https://github.com/reginabarzilaygroup/Sybil/releases)
+---
+**Note**: This is a research model. Always consult qualified healthcare professionals for medical decisions.