|
--- |
|
license: other |
|
license_name: exaonepath |
|
license_link: LICENSE |
|
tags: |
|
- lg-ai |
|
- EXAONEPath-1.5 |
|
- pathology |
|
--- |
|
# EXAONE Path 1.5 |
|
## Introduction |
|
EXAONE Path 1.5 is a whole slide image level(WSI-level) classification framework designed for downstream tasks in pathology, such as cancer subtyping, molecular subtyping and mutation prediction. It builds upon our previous work, EXAONE Path v1.0, which focused on patch-wise feature extraction by dividing a WSI into patches and embedding each patch into a feature vector. |
|
In EXAONE Path 1.5, we extend this pipeline to take an entire WSI as input. Each patch is first processed using the pretrained EXAONE Path 1.0 encoder to extract patch-level features. These features are then aggregated using a ViT-based (Vision Transformer) aggregator module to produce a slide-level representation. |
|
This aggregated representation is subsequently passed through a linear classifier to perform downstream tasks such as molecular subtyping, tumor subtyping, and mutation prediction. |
|
To effectively train the aggregator, we adopt a two-stage learning process: |
|
Pretraining: We employ multimodal learning by aligning slide images with various mRNA gene expression profiles to learn semantically meaningful slide-level representations. |
|
Fine-tuning: The pretrained model is then adapted to specific downstream classification tasks. |
|
In this repository, we release the model trained for EGFR mutation prediction in lung adenocarcinoma (LUAD), enabling researchers to leverage our pipeline for similar molecular pathology applications. |
|
## Quickstart |
|
### 1. Hardware Requirements ### |
|
- NVIDIA GPU is required |
|
- Minimum 40GB GPU memory recommended |
|
- Tested on Ubuntu 22.04 with NVIDIA driver version 550.144.03 |
|
Note: This implementation requires NVIDIA GPU and drivers. The provided environment setup specifically uses CUDA-enabled PyTorch, making NVIDIA GPU mandatory for running the model. |
|
### 2. Environment Setup |
|
``` |
|
pip install -r requirements.txt |
|
``` |
|
### 3-a. Load the model & Inference |
|
#### Load model with HuggingFace |
|
|
|
```python |
|
from models.exaonepath import EXAONEPathV1p5Downstream |
|
hf_token = "YOUR_HUGGING_FACE_ACCESS_TOKEN" |
|
model = EXAONEPathV1p5Downstream.from_pretrained("LGAI-EXAONE/EXAONE-Path-1.5", use_auth_token=hf_token) |
|
slide_path = './samples/wsis/1/1.svs' |
|
probs = model(slide_path) |
|
``` |
|
#### Fast CLI Inference |
|
Before running the command below, make sure you update your Hugging Face token. |
|
Open `tokens.py` and replace the placeholder with your actual token: |
|
|
|
```python |
|
HF_TOKEN = "YOUR_HUGGING_FACE_ACCESS_TOKEN" |
|
``` |
|
|
|
Then, run inference with: |
|
```bash |
|
python inference.py --svs_path ./samples/wsis/1/1.svs |
|
``` |
|
### 3-b. Fine-tuning with Pretrained Weights |
|
We provide example scripts and files to help you fine-tune the model on your own dataset. |
|
The provided script fine-tunes the model using pretrained weights stored in `./pretrained_weight.pth`. |
|
|
|
#### Extract Features from WSI Images |
|
To train the model using WSI images and their corresponding labels, |
|
you must first extract patch-level features from each WSI using our provided feature extractor. |
|
|
|
```bash |
|
python feature_extract.py --input_dir ./samples/wsis/ --output_dir ./samples/feats/ |
|
``` |
|
This will generate .pt feature files in the output_dir. |
|
|
|
#### Fine-tuning |
|
```bash |
|
bash tuning_script.sh |
|
``` |
|
Inside tuning_script.sh, you can modify the following variables to match your dataset: |
|
```bash |
|
FEAT_PATH=./samples/feats |
|
LABEL_PATH=./samples/label/label.csv |
|
LABEL_DICT="{'n':0, 'y':1}" |
|
SPLIT_PATH=./samples/splits |
|
``` |
|
Change these paths to point to your own feature, label, and split files to start training. |
|
|
|
## Model Performance Comparison |
|
| Metric: AUC | Titan(Conch v1.5+iBot, image+text) | PRISM (virchow+pe receiver, Image+text) | CHIEF (CTransPath + CLAM, Image+text, clam+wsi contrastive) | Prov-GigaPath (GigaPath+LongNet, Image-only, mask precision manner) | UNI2-h + CLAM (Image-only) | EXAONE Path 1.5(image+gene expression) | |
|
|--------------------------|----------------------------------|-----------------------------------------|--------------------------------------------------------------|------------------------------------------------------------------------|-----------------------------|------------------| |
|
| **TMB (cutoff 10)** | 0.74 | 0.73 | 0.70 | 0.69 | 0.71 | 0.71 | |
|
| **LUAD-EGFR-mut** | 0.76 | 0.80 | 0.73 | 0.73 | 0.79 | 0.81 | |
|
| **LUAD-KRAS-mut** | 0.61 | 0.65 | 0.61 | 0.66 | 0.60 | 0.63 | |
|
| **LUAD-Gene-overexp[1]** | 0.75 | 0.68 | 0.71 | 0.71 | 0.74 | 0.72 | |
|
| **CRC-MSS/MSI** | 0.89 | 0.88 | 0.86 | 0.90 | 0.90 | 0.89 | |
|
| **BRCA-ER_PR_HER2** | 0.82 | 0.79 | 0.76 | 0.79 | 0.81 | 0.77 | |
|
| **Pan-cancer-Gene-mut[2]** | 0.79 | 0.77 | 0.73 | 0.74 | 0.77 | 0.76 | |
|
| **Avg. AUC** | 0.77 | 0.76 | 0.73 | 0.74 | 0.77 | 0.76 | |
|
|
|
[1]: **lung-gene-overexp**: total 11 genes were evaluated: LAG3, CLDN6, CD274, EGFR, ERBB2, ERBB3, CD276, VTCN1, TACSTD2, FOLR1, MET. |
|
|
|
[2]: **Pan-cancer-Gene-mut**: total 7 genes were evaluated: TP53, KRAS, ALK, PIK3CA, MET, EGFR, PTEN |
|
|
|
## License |
|
The model is licensed under [EXAONEPath AI Model License Agreement 1.0 - NC](./LICENSE) |