MOSAIC Model Checkpoints
MOSAIC (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025.
Model Variants
This repository contains three pre-trained model checkpoints:
mosaic-perceiver-biogpt-lora.pt
- LoRA fine-tuned model (recommended for most use cases)mosaic-perceiver-biogpt-frozen.pt
- Frozen backbone modelmosaic-perceiver-biogpt-unfrozen.pt
- Fully fine-tuned model
Quick Start
Installation
First, install the MOSAIC framework from the source repository:
git clone https://github.com/SanderMoon/MOSAIC.git
cd MOSAIC
pip install -e .
pip install git+https://github.com/salaniz/pycocoevalcap.git
Download Model Checkpoints
# Set your Hugging Face token (required for access)
export HF_TOKEN=your_huggingface_token_here
# Install huggingface_hub CLI
pip install huggingface_hub[cli]
# Download the LoRA model (change filename for other models)
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False
Inference Example
from mosaic.model_factory import create_model, load_pretrained
import torch
import os
# Model configuration
model_name = "coca_stage_2_perceiver_lora_uni" # Use appropriate config for your model
pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt"
device = "cpu" # or "cuda" if available
# Create model and tokenizer
model, tokenizer, amp, input_dtype = create_model(
model_name=model_name,
pretrained=None,
precision="bf16",
device=device,
init_tokenizer=True,
)
# Load pretrained weights
load_pretrained(model, pretrained=pretrained_path, device=device)
def load_features_from_pth(file_path: str) -> torch.Tensor:
"""Load features from a .pth file with nested dictionary structure."""
data = torch.load(file_path, map_location=device)
features_list = []
for level_key in data.keys():
level_data = data[level_key]
for patch_id in sorted(level_data.keys()):
if "feature" in level_data[patch_id]:
feature = level_data[patch_id]["feature"]
if not isinstance(feature, torch.Tensor):
feature = torch.tensor(feature)
features_list.append(feature.to(device))
if features_list:
stacked_features = torch.stack(features_list, dim=0)
return stacked_features.unsqueeze(0)
else:
raise ValueError(f"No features found in {file_path}")
# Generation parameters
generation_params = {
"seq_len": 128,
"max_seq_len": 128,
"temperature": 1.0,
"generation_type": "top_k",
"top_k": 1,
"min_seq_len": 5,
"repetition_penalty": 1.1,
}
# Process slide features (example)
slide_path = "path/to/your/slide_features.pth"
visual_features = load_features_from_pth(slide_path)
model.eval()
with torch.no_grad():
# Generate pathology report
generated_ids = model.generate(
image=visual_features,
sot_token_id=tokenizer.all_special_ids[0],
eos_token_id=tokenizer.all_special_ids[1],
pad_token_id=tokenizer.all_special_ids[3],
**generation_params,
)
# Decode generated text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Generated Report: {generated_text.strip()}")
Model Configuration Mapping
Use the appropriate model configuration for each checkpoint:
- LoRA model:
coca_stage_2_perceiver_lora_uni
- Frozen model:
coca_stage_2_perceiver_frozen_uni
- Unfrozen model:
coca_stage_2_perceiver_unfrozen_uni
Requirements
- Python >= 3.10
- PyTorch >= 2.0
- transformers
- CUDA-compatible GPU (recommended, but CPU is supported)
Source Code
The complete source code, training scripts, and documentation are available at: https://github.com/SanderMoon/MOSAIC
Citation
If you use these models in your research, please cite our paper:
@misc{lucassen2025pathologyreportgenerationmultimodal,
title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
year={2025},
eprint={2502.19293},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.19293},
}
License
This project is licensed under the Apache License 2.0. See the LICENSE file in the source repository for details.
Contact
For questions or support, please contact:
- Sander Moonemans: [email protected]
This work was developed as part of research into computational pathology and vision-language models for medical image analysis.
Model tree for SaltySander/MOSAIC
Base model
microsoft/biogpt