You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MOSAIC Model Checkpoints

MOSAIC (Multimodal Optical Slide Analysis Including Comparisons) is a framework for training and inferencing vision-language models for computational pathology. These pre-trained models are from the paper "Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions" by Lucassen et al. (2025), which was accepted at MICCAI 2025.

Model Variants

This repository contains three pre-trained model checkpoints:

  • mosaic-perceiver-biogpt-lora.pt - LoRA fine-tuned model (recommended for most use cases)
  • mosaic-perceiver-biogpt-frozen.pt - Frozen backbone model
  • mosaic-perceiver-biogpt-unfrozen.pt - Fully fine-tuned model

Quick Start

Installation

First, install the MOSAIC framework from the source repository:

git clone https://github.com/SanderMoon/MOSAIC.git
cd MOSAIC
pip install -e .
pip install git+https://github.com/salaniz/pycocoevalcap.git

Download Model Checkpoints

# Set your Hugging Face token (required for access)
export HF_TOKEN=your_huggingface_token_here

# Install huggingface_hub CLI
pip install huggingface_hub[cli]

# Download the LoRA model (change filename for other models)
huggingface-cli download SaltySander/MOSAIC checkpoints/mosaic-perceiver-biogpt-lora.pt --local-dir . --local-dir-use-symlinks False

Inference Example

from mosaic.model_factory import create_model, load_pretrained
import torch
import os

# Model configuration
model_name = "coca_stage_2_perceiver_lora_uni"  # Use appropriate config for your model
pretrained_path = "checkpoints/mosaic-perceiver-biogpt-lora.pt"
device = "cpu"  # or "cuda" if available

# Create model and tokenizer
model, tokenizer, amp, input_dtype = create_model(
    model_name=model_name,
    pretrained=None,
    precision="bf16",
    device=device,
    init_tokenizer=True,
)

# Load pretrained weights
load_pretrained(model, pretrained=pretrained_path, device=device)

def load_features_from_pth(file_path: str) -> torch.Tensor:
    """Load features from a .pth file with nested dictionary structure."""
    data = torch.load(file_path, map_location=device)
    features_list = []
    
    for level_key in data.keys():
        level_data = data[level_key]
        for patch_id in sorted(level_data.keys()):
            if "feature" in level_data[patch_id]:
                feature = level_data[patch_id]["feature"]
                if not isinstance(feature, torch.Tensor):
                    feature = torch.tensor(feature)
                features_list.append(feature.to(device))
    
    if features_list:
        stacked_features = torch.stack(features_list, dim=0)
        return stacked_features.unsqueeze(0) 
    else:
        raise ValueError(f"No features found in {file_path}")

# Generation parameters
generation_params = {
    "seq_len": 128,
    "max_seq_len": 128,
    "temperature": 1.0,
    "generation_type": "top_k",
    "top_k": 1,
    "min_seq_len": 5,
    "repetition_penalty": 1.1,
}

# Process slide features (example)
slide_path = "path/to/your/slide_features.pth"
visual_features = load_features_from_pth(slide_path)

model.eval()
with torch.no_grad():
    # Generate pathology report
    generated_ids = model.generate(
        image=visual_features,
        sot_token_id=tokenizer.all_special_ids[0],
        eos_token_id=tokenizer.all_special_ids[1],
        pad_token_id=tokenizer.all_special_ids[3],
        **generation_params,
    )
    
    # Decode generated text
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    print(f"Generated Report: {generated_text.strip()}")

Model Configuration Mapping

Use the appropriate model configuration for each checkpoint:

  • LoRA model: coca_stage_2_perceiver_lora_uni
  • Frozen model: coca_stage_2_perceiver_frozen_uni
  • Unfrozen model: coca_stage_2_perceiver_unfrozen_uni

Requirements

  • Python >= 3.10
  • PyTorch >= 2.0
  • transformers
  • CUDA-compatible GPU (recommended, but CPU is supported)

Source Code

The complete source code, training scripts, and documentation are available at: https://github.com/SanderMoon/MOSAIC

Citation

If you use these models in your research, please cite our paper:

@misc{lucassen2025pathologyreportgenerationmultimodal,
    title={Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions},
    author={Ruben T. Lucassen and Sander P. J. Moonemans and Tijn van de Luijtgaarden and Gerben E. Breimer and Willeke A. M. Blokx and Mitko Veta},
    year={2025},
    eprint={2502.19293},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2502.19293},
}

License

This project is licensed under the Apache License 2.0. See the LICENSE file in the source repository for details.

Contact

For questions or support, please contact:


This work was developed as part of research into computational pathology and vision-language models for medical image analysis.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SaltySander/MOSAIC

Base model

microsoft/biogpt
Finetuned
(62)
this model