MOSAIC Model Checkpoints

MOSAIC (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025.

Model Variants

This repository contains three pre-trained model checkpoints:

mosaic-perceiver-biogpt-lora.pt - LoRA fine-tuned model (recommended for most use cases)
mosaic-perceiver-biogpt-frozen.pt - Frozen backbone model
mosaic-perceiver-biogpt-unfrozen.pt - Fully fine-tuned model

Quick Start

Installation

First, install the MOSAIC framework from the source repository:

git clone https://github.com/SanderMoon/MOSAIC.git
cd MOSAIC
pip install -e .
pip install git+https://github.com/salaniz/pycocoevalcap.git

Download Model Checkpoints

# Set your Hugging Face token (required for access)
export HF_TOKEN=your_huggingface_token_here

# Install huggingface_hub CLI
pip install huggingface_hub[cli]

# Download the LoRA model (change filename for other models)
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False

Inference Example

from mosaic.model_factory import create_model, load_pretrained
import torch
import os

# Model configuration
model_name = "coca_stage_2_perceiver_lora_uni"  # Use appropriate config for your model
pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt"
device = "cpu"  # or "cuda" if available

# Create model and tokenizer
model, tokenizer, amp, input_dtype = create_model(
    model_name=model_name,
    pretrained=None,
    precision="bf16",
    device=device,
    init_tokenizer=True,
)

# Load pretrained weights
load_pretrained(model, pretrained=pretrained_path, device=device)

def load_features_from_pth(file_path: str) -> torch.Tensor:
    """Load features from a .pth file with nested dictionary structure."""
    data = torch.load(file_path, map_location=device)
    features_list = []
    
    for level_key in data.keys():
        level_data = data[level_key]
        for patch_id in sorted(level_data.keys()):
            if "feature" in level_data[patch_id]:
                feature = level_data[patch_id]["feature"]
                if not isinstance(feature, torch.Tensor):
                    feature = torch.tensor(feature)
                features_list.append(feature.to(device))
    
    if features_list:
        stacked_features = torch.stack(features_list, dim=0)
        return stacked_features.unsqueeze(0) 
    else:
        raise ValueError(f"No features found in {file_path}")

# Generation parameters
generation_params = {
    "seq_len": 128,
    "max_seq_len": 128,
    "temperature": 1.0,
    "generation_type": "top_k",
    "top_k": 1,
    "min_seq_len": 5,
    "repetition_penalty": 1.1,
}

# Process slide features (example)
slide_path = "path/to/your/slide_features.pth"
visual_features = load_features_from_pth(slide_path)

model.eval()
with torch.no_grad():
    # Generate pathology report
    generated_ids = model.generate(
        image=visual_features,
        sot_token_id=tokenizer.all_special_ids[0],
        eos_token_id=tokenizer.all_special_ids[1],
        pad_token_id=tokenizer.all_special_ids[3],
        **generation_params,
    )
    
    # Decode generated text
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    print(f"Generated Report: {generated_text.strip()}")

Model Configuration Mapping

Use the appropriate model configuration for each checkpoint:

LoRA model: coca_stage_2_perceiver_lora_uni
Frozen model: coca_stage_2_perceiver_frozen_uni
Unfrozen model: coca_stage_2_perceiver_unfrozen_uni

Requirements

Python >= 3.10
PyTorch >= 2.0
transformers
CUDA-compatible GPU (recommended, but CPU is supported)

Source Code

The complete source code, training scripts, and documentation are available at: https://github.com/SanderMoon/MOSAIC

Citation

If you use these models in your research, please cite our paper:

@misc{lucassen2025pathologyreportgenerationmultimodal,
    title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
    author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
    year={2025},
    eprint={2502.19293},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2502.19293},
}

License

This project is licensed under the Apache License 2.0. See the LICENSE file in the source repository for details.

Contact

For questions or support, please contact:

Sander Moonemans: [email protected]

This work was developed as part of research into computational pathology and vision-language models for medical image analysis.

SaltySander
/

MOSAIC

You need to agree to share your contact information to access this model