Model card for MIPHEI-ViT

MIPHEI-ViT is a deep learning model that predicts 16-channel multiplex immunofluorescence (mIF) images from standard H&E-stained histology images. It uses a U-Net-style architecture with a ViT foundation model (H-Optimus-0) as the encoder, inspired by the ViTMatte model.

This work is described in our paper:

“MIPHEI-vit: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models.”

Please see the publication for full results and details.

The model was trained on a processed version of the ORION-CRC dataset, available here: 🔗 MIPHEI-ViT Dataset on Zenodo

It takes H&E image tiles as input and outputs 16-channel mIF predictions for the following markers: Hoechst, CD31, CD45, CD68, CD4, FOXP3, CD8a, CD45RO, CD20, PD-L1, CD3e, CD163, E-cadherin, Ki67, Pan-CK, SMA

For optimal performances, input H&E images should come from colon tissue and be scanned at 0.5 µm/pixel (20x magnification). However, because the model is built on a large ViT foundation (H-Optimus-0), so you may try applying it to other tissue type as well.

MIPHEI-ViT Architecture

Figure: Overview of the MIPHEI-ViT architecture.

This model was developed as part of research funded by Sanofi and ANRT.

🚀 Demo

You can try the model directly in your browser and upload your own H&E images:

🔍 Model Usage

Clone the model repository

This brings the code and files (including model.py, weights, config, etc.) to your machine:

git lfs install  # only needed once, if not already done
git clone https://huggingface.co/Estabousi/MIPHEI-vit
cd MIPHEI-vit
pip install -r requirements.txt # torch, timm, safetensors, numpy, Pillow, huggingface_hub

Load the model

import torch
from model import MIPHEIViT
device = "cuda" if torch.cuda.is_available() else "cpu"
model = MIPHEIViT.from_pretrained_hf(repo_path=".")
model.set_input_size((width, height)) # width, height power of 2 and at least 128
model.eval().to(device).half() # faster in half precision

Run inference on a H&E tile

from PIL import Image
import torchvision.transforms as T

# Load and preprocess your tile
img = Image.open("tile.jpg").convert("RGB")

transform = T.Compose([
    T.Resize((width, height)),
    T.ToTensor(),  # Converts to shape [3, H, W], range [0,1]
    T.Normalize(
        mean=(0.707223, 0.578729, 0.703617),
        std=(0.211883, 0.230117, 0.177517)
    ),  # H-optimus-0 normalization
])
tile_tensor = transform(img).unsqueeze(0)  # Add batch dim: [1, 3, width, height]

# Predict mIF channels
with torch.inference_mode():
    mif_pred = model(tile_tensor.to(device).half()).squeeze()  # Output: [16, width, height]
    mif_pred = (mif_pred.clamp(-0.9, 0.9) + 0.9) / 1.8  # [-0.9, 0.9] -> [0., 1.]
    mif_pred = (mif_pred * 255).to(torch.uint8)
    mif_pred = mif_pred.permute((1, 2, 0)).cpu()  # Output: [width, height, 16]

Output corresponds to the following 16 markers:

['Hoechst', 'CD31', 'CD45', 'CD68', 'CD4', 'FOXP3', 'CD8a', 'CD45RO',
 'CD20', 'PD-L1', 'CD3e', 'CD163', 'E-cadherin', 'Ki67', 'Pan-CK', 'SMA']

You can also try our model in colab:

📁 Files Included

model.py: model architecture
model.safetensors: pretrained weights
logreg.pth: pretrained cell type linear classifier
config_hf.json: inference configuration used by huggingface
config.yaml: training configuration parameters
requirements.txt: requirements for installing necessary pip packages

📖 Citation

If you use this work, please cite:

G. Balezo, R. Trullo, A. Pla Planas, E. Decenciere, and T. Walter, “MIPHEI-ViT: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models,” arXiv preprint arXiv:2505.10294, 2025.

🧪 More Details

For full training, preprocessing, visualizations, and evaluations, visit the

📄 License

Released by Sanofi under specific license conditions, including a limitation to non-commercial use only. See the LICENSE file for details.

Estabousi
/

MIPHEI-vit