EXAONE Path 1.5

Introduction

EXAONE Path 1.5 is a whole slide image level(WSI-level) classification framework designed for downstream tasks in pathology, such as cancer subtyping, molecular subtyping and mutation prediction. It builds upon our previous work, EXAONE Path v1.0, which focused on patch-wise feature extraction by dividing a WSI into patches and embedding each patch into a feature vector. In EXAONE Path 1.5, we extend this pipeline to take an entire WSI as input. Each patch is first processed using the pretrained EXAONE Path 1.0 encoder to extract patch-level features. These features are then aggregated using a ViT-based (Vision Transformer) aggregator module to produce a slide-level representation. This aggregated representation is subsequently passed through a linear classifier to perform downstream tasks such as molecular subtyping, tumor subtyping, and mutation prediction. To effectively train the aggregator, we adopt a two-stage learning process: Pretraining: We employ multimodal learning by aligning slide images with various mRNA gene expression profiles to learn semantically meaningful slide-level representations. Fine-tuning: The pretrained model is then adapted to specific downstream classification tasks. In this repository, we release the model trained for EGFR mutation prediction in lung adenocarcinoma (LUAD), enabling researchers to leverage our pipeline for similar molecular pathology applications.

Quickstart

1. Hardware Requirements

NVIDIA GPU is required
Minimum 40GB GPU memory recommended
Tested on Ubuntu 22.04 with NVIDIA driver version 550.144.03 Note: This implementation requires NVIDIA GPU and drivers. The provided environment setup specifically uses CUDA-enabled PyTorch, making NVIDIA GPU mandatory for running the model.

2. Environment Setup

pip install -r requirements.txt

3-a. Load the model & Inference

Load model with HuggingFace

from models.exaonepath import EXAONEPathV1p5Downstream
hf_token = "YOUR_HUGGING_FACE_ACCESS_TOKEN"
model = EXAONEPathV1p5Downstream.from_pretrained("LGAI-EXAONE/EXAONE-Path-1.5", use_auth_token=hf_token)
slide_path = './samples/wsis/1/1.svs'
probs = model(slide_path)

Fast CLI Inference

Before running the command below, make sure you update your Hugging Face token. Open tokens.py and replace the placeholder with your actual token:

HF_TOKEN = "YOUR_HUGGING_FACE_ACCESS_TOKEN"

Then, run inference with:

python inference.py --svs_path ./samples/wsis/1/1.svs

3-b. Fine-tuning with Pretrained Weights

We provide example scripts and files to help you fine-tune the model on your own dataset. The provided script fine-tunes the model using pretrained weights stored in ./pretrained_weight.pth.

Extract Features from WSI Images

To train the model using WSI images and their corresponding labels,
you must first extract patch-level features from each WSI using our provided feature extractor.

python feature_extract.py —input_dir ./samples/wsis/ —output_dir ./samples/feats/

This will generate .pt feature files in the output_dir.

Fine-tuning

bash tuning_script.sh

Inside tuning_script.sh, you can modify the following variables to match your dataset:

FEAT_PATH=./samples/feats
LABEL_PATH=./samples/label/label.csv
LABEL_DICT="{'n':0, 'y':1}"
SPLIT_PATH=./samples/splits

Change these paths to point to your own feature, label, and split files to start training.

Model Performance Comparison

Metric: AUC	Titan(Conch v1.5+iBot, image+text)	PRISM (virchow+pe receiver, Image+text)	CHIEF (CTransPath + CLAM, Image+text, clam+wsi contrastive)	Prov-GigaPath (GigaPath+LongNet, Image-only, mask precision manner)	UNI2-h + CLAM (Image-only)	EXAONE Path 1.5(image+gene expression)
TMB (cutoff 10)	0.74	0.73	0.70	0.69	0.71	0.71
LUAD-EGFR-mut	0.76	0.80	0.73	0.73	0.79	0.81
LUAD-KRAS-mut	0.61	0.65	0.61	0.66	0.60	0.63
LUAD-Gene-overexp[1]	0.75	0.68	0.71	0.71	0.74	0.72
CRC-MSS/MSI	0.89	0.88	0.86	0.90	0.90	0.89
BRCA-ER_PR_HER2	0.82	0.79	0.76	0.79	0.81	0.77
Pan-cancer-Gene-mut[2]	0.79	0.77	0.73	0.74	0.77	0.76
Avg. AUC	0.77	0.76	0.73	0.74	0.77	0.76

[1]: lung-gene-overexp: total 11 genes were evaluated: LAG3, CLDN6, CD274, EGFR, ERBB2, ERBB3, CD276, VTCN1, TACSTD2, FOLR1, MET.

[2]: Pan-cancer-Gene-mut: total 7 genes were evaluated: TP53, KRAS, ALK, PIK3CA, MET, EGFR, PTEN

License

The model is licensed under EXAONEPath AI Model License Agreement 1.0 - NC

LGAI-EXAONE
/

EXAONE-Path-1.5

You need to agree to share your contact information to access this model