Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,39 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# ✨ ModalFormer: Multimodal Transformer for Low-Light Image Enhancement
|
| 6 |
+
|
| 7 |
+
<div align="center">
|
| 8 |
+
|
| 9 |
+
**[Alexandru Brateanu](https://scholar.google.com/citations?user=ru0meGgAAAAJ&hl=en), [Raul Balmez](https://scholar.google.com/citations?user=vPC7raQAAAAJ&hl=en), [Ciprian Orhei](https://scholar.google.com/citations?user=DZHdq3wAAAAJ&hl=en), [Codruta Ancuti](https://scholar.google.com/citations?user=5PA43eEAAAAJ&hl=en), [Cosmin Ancuti](https://scholar.google.com/citations?user=zVTgt8IAAAAJ&hl=en)**
|
| 10 |
+
|
| 11 |
+
[](https://arxiv.org/abs/2401.15204)
|
| 12 |
+
</div>
|
| 13 |
+
|
| 14 |
+
### Abstract
|
| 15 |
+
*Low-light image enhancement (LLIE) is a fundamental yet challenging task due to the presence of noise, loss of detail, and poor contrast in images captured under insufficient lighting conditions. Recent methods often rely solely on pixel-level transformations of RGB images, neglecting the rich contextual information available from multiple visual modalities. In this paper, we present ModalFormer, the first large-scale multimodal framework for LLIE that fully exploits nine auxiliary modalities to achieve state-of-the-art performance. Our model comprises two main components: a Cross-modal Transformer (CM-T) designed to restore corrupted images while seamlessly integrating multimodal information, and multiple auxiliary subnetworks dedicated to multimodal feature reconstruction. Central to the CM-T is our novel Cross-modal Multi-headed Self-Attention mechanism (CM-MSA), which effectively fuses RGB data with modality-specific features—including deep feature embeddings, segmentation information, geometric cues, and color information—to generate information-rich hybrid attention maps. Extensive experiments on multiple benchmark datasets demonstrate ModalFormer’s state-of-the-art performance in LLIE. Pre-trained models and results are made available at https://github.com/albrateanu/ModalFormer*
|
| 16 |
+
|
| 17 |
+
## 🆕 Updates
|
| 18 |
+
- `29.07.2025` 🎉 The [**ModalFormer**](https://arxiv.org/abs/2401.15204) paper is now available! Check it out and explore our results and methodology.
|
| 19 |
+
- `28.07.2025` 📦 Pre-trained models and test data published! ArXiv paper version and HuggingFace demo coming soon, stay tuned!
|
| 20 |
+
|
| 21 |
+
## ⚙️ Setup and Testing
|
| 22 |
+
Please check out the [**GitHub repository**](https://github.com/albrateanu/ModalFormer) for implementation details.
|
| 23 |
+
|
| 24 |
+
## 📚 Citation
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
@misc{brateanu2025modalformer,
|
| 28 |
+
title={ModalFormer: Multimodal Transformer for Low-Light Image Enhancement},
|
| 29 |
+
author={Alexandru Brateanu and Raul Balmez and Ciprian Orhei and Codruta Ancuti and Cosmin Ancuti},
|
| 30 |
+
year={2025},
|
| 31 |
+
eprint={2507.20388},
|
| 32 |
+
archivePrefix={arXiv},
|
| 33 |
+
primaryClass={cs.CV},
|
| 34 |
+
url={https://arxiv.org/abs/2507.20388},
|
| 35 |
+
}
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## 🙏 Acknowledgements
|
| 39 |
+
We use [this codebase](https://github.com/caiyuanhao1998/Retinexformer) as foundation for our implementation.
|