albrateanu commited on
Commit
726aaad
·
verified ·
1 Parent(s): cadcd4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # ✨ ModalFormer: Multimodal Transformer for Low-Light Image Enhancement
6
+
7
+ <div align="center">
8
+
9
+ **[Alexandru Brateanu](https://scholar.google.com/citations?user=ru0meGgAAAAJ&hl=en), [Raul Balmez](https://scholar.google.com/citations?user=vPC7raQAAAAJ&hl=en), [Ciprian Orhei](https://scholar.google.com/citations?user=DZHdq3wAAAAJ&hl=en), [Codruta Ancuti](https://scholar.google.com/citations?user=5PA43eEAAAAJ&hl=en), [Cosmin Ancuti](https://scholar.google.com/citations?user=zVTgt8IAAAAJ&hl=en)**
10
+
11
+ [![arXiv](https://img.shields.io/badge/arxiv-paper-179bd3)](https://arxiv.org/abs/2401.15204)
12
+ </div>
13
+
14
+ ### Abstract
15
+ *Low-light image enhancement (LLIE) is a fundamental yet challenging task due to the presence of noise, loss of detail, and poor contrast in images captured under insufficient lighting conditions. Recent methods often rely solely on pixel-level transformations of RGB images, neglecting the rich contextual information available from multiple visual modalities. In this paper, we present ModalFormer, the first large-scale multimodal framework for LLIE that fully exploits nine auxiliary modalities to achieve state-of-the-art performance. Our model comprises two main components: a Cross-modal Transformer (CM-T) designed to restore corrupted images while seamlessly integrating multimodal information, and multiple auxiliary subnetworks dedicated to multimodal feature reconstruction. Central to the CM-T is our novel Cross-modal Multi-headed Self-Attention mechanism (CM-MSA), which effectively fuses RGB data with modality-specific features—including deep feature embeddings, segmentation information, geometric cues, and color information—to generate information-rich hybrid attention maps. Extensive experiments on multiple benchmark datasets demonstrate ModalFormer’s state-of-the-art performance in LLIE. Pre-trained models and results are made available at https://github.com/albrateanu/ModalFormer*
16
+
17
+ ## 🆕 Updates
18
+ - `29.07.2025` 🎉 The [**ModalFormer**](https://arxiv.org/abs/2401.15204) paper is now available! Check it out and explore our results and methodology.
19
+ - `28.07.2025` 📦 Pre-trained models and test data published! ArXiv paper version and HuggingFace demo coming soon, stay tuned!
20
+
21
+ ## ⚙️ Setup and Testing
22
+ Please check out the [**GitHub repository**](https://github.com/albrateanu/ModalFormer) for implementation details.
23
+
24
+ ## 📚 Citation
25
+
26
+ ```
27
+ @misc{brateanu2025modalformer,
28
+ title={ModalFormer: Multimodal Transformer for Low-Light Image Enhancement},
29
+ author={Alexandru Brateanu and Raul Balmez and Ciprian Orhei and Codruta Ancuti and Cosmin Ancuti},
30
+ year={2025},
31
+ eprint={2507.20388},
32
+ archivePrefix={arXiv},
33
+ primaryClass={cs.CV},
34
+ url={https://arxiv.org/abs/2507.20388},
35
+ }
36
+ ```
37
+
38
+ ## 🙏 Acknowledgements
39
+ We use [this codebase](https://github.com/caiyuanhao1998/Retinexformer) as foundation for our implementation.