Claris RF-Channel: DehazeFormer with Reference Frame Input

This repository provides a PyTorch implementation of a DehazeFormer-based model. This model uses a transformer-based backbone and processes a reference frame concatenated along the channel dimension with the input image to enhance visibility and remove smoke/haze artifacts in surgical endoscopic scenes.

This model version, also referred to as mct-diffusion-overlay-p40-v1-rf-channel.pth, was trained with a combination of synthetic overlays and diffusion-generated smoke-haze image pairs.

Features

Transformer-based architecture for image enhancement.
Supports reference frame input via channel or spatial concatenation.
Hugging Face Transformers-compatible interface.
Example inference script included.

File Structure

claris_rf_channel/
├── dehazeformer.py
├── inference_example.py
├── pytorch_model.bin
├── config.json
├── sample_img.png
└── ref_img.png

Quick Start

Install Requirements

pip install torch torchvision transformers timm pillow

Inference Example

You can run the provided inference script to dehaze the sample image:

python inference_example.py

This will save the output as output_img_rfchannel.png.

Or use the model in your own code, by loading the model as follows :

from transformers import AutoModel

# Load model
model = AutoModel.from_pretrained("vopeai/claris-RF-channel", trust_remote_code=True)
model.to(device)
model.eval()

# Inference 
with torch.no_grad():
    output = model(input_img, ref_img)

The model takes as input a pillow image or a tensor.

For more details, see the code files in this repository.