EXAONE-Path-1.5 / README.md

edit README

1c17407 2 days ago

6.95 kB

	---
	license: other
	license_name: exaonepath
	license_link: LICENSE
	tags:
	- lg-ai
	- EXAONEPath-1.5
	- pathology
	---
	# EXAONE Path 1.5
	## Introduction
	EXAONE Path 1.5 is a whole slide image level(WSI-level) classification framework designed for downstream tasks in pathology, such as cancer subtyping, molecular subtyping and mutation prediction. It builds upon our previous work, EXAONE Path v1.0, which focused on patch-wise feature extraction by dividing a WSI into patches and embedding each patch into a feature vector.
	In EXAONE Path 1.5, we extend this pipeline to take an entire WSI as input. Each patch is first processed using the pretrained EXAONE Path 1.0 encoder to extract patch-level features. These features are then aggregated using a ViT-based (Vision Transformer) aggregator module to produce a slide-level representation.
	This aggregated representation is subsequently passed through a linear classifier to perform downstream tasks such as molecular subtyping, tumor subtyping, and mutation prediction.
	To effectively train the aggregator, we adopt a two-stage learning process:
	Pretraining: We employ multimodal learning by aligning slide images with various mRNA gene expression profiles to learn semantically meaningful slide-level representations.
	Fine-tuning: The pretrained model is then adapted to specific downstream classification tasks.
	In this repository, we release the model trained for EGFR mutation prediction in lung adenocarcinoma (LUAD), enabling researchers to leverage our pipeline for similar molecular pathology applications.
	## Quickstart
	### 1. Hardware Requirements ###
	- NVIDIA GPU is required
	- Minimum 40GB GPU memory recommended
	- Tested on Ubuntu 22.04 with NVIDIA driver version 550.144.03
	Note: This implementation requires NVIDIA GPU and drivers. The provided environment setup specifically uses CUDA-enabled PyTorch, making NVIDIA GPU mandatory for running the model.
	### 2. Environment Setup
	```
	pip install -r requirements.txt
	```
	### 3-a. Load the model & Inference
	#### Load model with HuggingFace

	```python
	from models.exaonepath import EXAONEPathV1p5Downstream
	hf_token = "YOUR_HUGGING_FACE_ACCESS_TOKEN"
	model = EXAONEPathV1p5Downstream.from_pretrained("LGAI-EXAONE/EXAONE-Path-1.5", use_auth_token=hf_token)
	slide_path = './samples/wsis/1/1.svs'
	probs = model(slide_path)
	```
	#### Fast CLI Inference
	Before running the command below, make sure you update your Hugging Face token.
	Open `tokens.py` and replace the placeholder with your actual token:

	```python
	HF_TOKEN = "YOUR_HUGGING_FACE_ACCESS_TOKEN"
	```

	Then, run inference with:
	```bash
	python inference.py --svs_path ./samples/wsis/1/1.svs
	```
	### 3-b. Fine-tuning with Pretrained Weights
	We provide example scripts and files to help you fine-tune the model on your own dataset.
	The provided script fine-tunes the model using pretrained weights stored in `./pretrained_weight.pth`.

	#### Extract Features from WSI Images
	To train the model using WSI images and their corresponding labels,
	you must first extract patch-level features from each WSI using our provided feature extractor.

	```bash
	python feature_extract.py --input_dir ./samples/wsis/ --output_dir ./samples/feats/
	```
	This will generate .pt feature files in the output_dir.

	#### Fine-tuning
	```bash
	bash tuning_script.sh
	```
	Inside tuning_script.sh, you can modify the following variables to match your dataset:
	```bash
	FEAT_PATH=./samples/feats
	LABEL_PATH=./samples/label/label.csv
	LABEL_DICT="{'n':0, 'y':1}"
	SPLIT_PATH=./samples/splits
	```
	Change these paths to point to your own feature, label, and split files to start training.

	## Model Performance Comparison
	\| Metric: AUC \| Titan(Conch v1.5+iBot, image+text) \| PRISM (virchow+pe receiver, Image+text) \| CHIEF (CTransPath + CLAM, Image+text, clam+wsi contrastive) \| Prov-GigaPath (GigaPath+LongNet, Image-only, mask precision manner) \| UNI2-h + CLAM (Image-only) \| EXAONE Path 1.5(image+gene expression) \|
	\|--------------------------\|----------------------------------\|-----------------------------------------\|--------------------------------------------------------------\|------------------------------------------------------------------------\|-----------------------------\|------------------\|
	\| TMB (cutoff 10) \| 0.74 \| 0.73 \| 0.70 \| 0.69 \| 0.71 \| 0.71 \|
	\| LUAD-EGFR-mut \| 0.76 \| 0.80 \| 0.73 \| 0.73 \| 0.79 \| 0.81 \|
	\| LUAD-KRAS-mut \| 0.61 \| 0.65 \| 0.61 \| 0.66 \| 0.60 \| 0.63 \|
	\| LUAD-Gene-overexp[1] \| 0.75 \| 0.68 \| 0.71 \| 0.71 \| 0.74 \| 0.72 \|
	\| CRC-MSS/MSI \| 0.89 \| 0.88 \| 0.86 \| 0.90 \| 0.90 \| 0.89 \|
	\| BRCA-ER_PR_HER2 \| 0.82 \| 0.79 \| 0.76 \| 0.79 \| 0.81 \| 0.77 \|
	\| Pan-cancer-Gene-mut[2] \| 0.79 \| 0.77 \| 0.73 \| 0.74 \| 0.77 \| 0.76 \|
	\| Avg. AUC \| 0.77 \| 0.76 \| 0.73 \| 0.74 \| 0.77 \| 0.76 \|

	[1]: lung-gene-overexp: total 11 genes were evaluated: LAG3, CLDN6, CD274, EGFR, ERBB2, ERBB3, CD276, VTCN1, TACSTD2, FOLR1, MET.

	[2]: Pan-cancer-Gene-mut: total 7 genes were evaluated: TP53, KRAS, ALK, PIK3CA, MET, EGFR, PTEN

	## License
	The model is licensed under [EXAONEPath AI Model License Agreement 1.0 - NC](./LICENSE)