Bingxin commited on
Commit
7629caa
·
verified ·
1 Parent(s): 3ca82f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -39
README.md CHANGED
@@ -1,58 +1,87 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  pipeline_tag: depth-estimation
 
6
  tags:
7
- - monocular depth estimation
8
- - single image depth estimation
9
- - depth
10
  - in-the-wild
11
  - zero-shot
12
- - depth
13
  ---
14
- # Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
15
 
16
- This model represents the internal test checkpoint of affine-invariant disparity version (train_marigold_affine_disparity_iter_24000).
17
 
18
- [![Website](doc/badges/badge-website.svg)](https://marigoldmonodepth.github.io)
19
- [![GitHub](https://img.shields.io/github/stars/prs-eth/Marigold?style=default&label=GitHub%20★&logo=github)](https://github.com/prs-eth/Marigold)
20
- [![Paper](doc/badges/badge-pdf.svg)](https://arxiv.org/abs/2312.02145)
21
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12G8reD13DdpMie5ZQlaFNo2WCGeNUH-u?usp=sharing)
22
- [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/toshas/marigold)
23
- [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
24
- <!-- [![HF Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-blue)]() -->
25
- <!-- [![Open In Colab](doc/badges/badge-colab.svg)]() -->
26
- <!-- [![Docker](doc/badges/badge-docker.svg)]() -->
27
- <!-- ### [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation]() -->
28
 
29
- [Bingxin Ke](http://www.kebingxin.com/),
30
- [Anton Obukhov](https://www.obukhov.ai/),
31
- [Shengyu Huang](https://shengyuh.github.io/),
32
- [Nando Metzger](https://nandometzger.github.io/),
33
- [Rodrigo Caye Daudt](https://rcdaudt.github.io/),
34
- [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en )
35
 
36
- We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- ![teaser](doc/teaser_collage_transparant.png)
 
 
 
 
39
 
 
 
 
 
40
 
41
- ## 🎓 Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ```bibtex
44
- @InProceedings{ke2023repurposing,
45
- title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
46
- author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
47
- booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
48
- year={2024}
 
 
49
  }
50
- ```
51
-
52
- ## 🎫 License
53
-
54
- This work is licensed under the Apache License, Version 2.0 (as defined in the [LICENSE](LICENSE.txt)).
55
 
56
- By downloading and using the code and model you agree to the terms in the [LICENSE](LICENSE.txt).
57
-
58
- [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: openrail++
5
  pipeline_tag: depth-estimation
6
+ library_name: diffusers
7
  tags:
8
+ - depth estimation
9
+ - image analysis
10
+ - computer vision
11
  - in-the-wild
12
  - zero-shot
13
+ pinned: true
14
  ---
 
15
 
16
+ <h1 align="center">Marigold Disparity v0.1 Model Card</h1>
17
 
18
+ <!-- This model represents the internal test checkpoint of affine-invariant disparity version (train_marigold_affine_disparity_iter_24000). -->
 
 
 
 
 
 
 
 
 
19
 
 
 
 
 
 
 
20
 
21
+ <p align="center">
22
+ <a title="Image Depth" href="https://huggingface.co/spaces/prs-eth/marigold" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
23
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Image%20Depth%20-Demo-yellow" alt="Image Depth">
24
+ </a>
25
+ <a title="diffusers" href="https://huggingface.co/docs/diffusers/using-diffusers/marigold_usage" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
26
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20diffusers%20-Integration%20🧨-yellow" alt="diffusers">
27
+ </a>
28
+ <a title="Github" href="https://github.com/prs-eth/marigold" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
29
+ <img src="https://img.shields.io/github/stars/prs-eth/marigold?label=GitHub%20%E2%98%85&logo=github&color=C8C" alt="Github">
30
+ </a>
31
+ <a title="Website" href="https://marigoldcomputervision.github.io/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
32
+ <img src="https://img.shields.io/badge/%E2%99%A5%20Project%20-Website-blue" alt="Website">
33
+ </a>
34
+ <a title="arXiv" href="https://arxiv.org/abs/2505.09358" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
35
+ <img src="https://img.shields.io/badge/%F0%9F%93%84%20Read%20-Paper-AF3436" alt="arXiv">
36
+ </a>
37
+ <a title="Social" href="https://twitter.com/antonobukhov1" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
38
+ <img src="https://img.shields.io/twitter/follow/:?label=Subscribe%20for%20updates!" alt="Social">
39
+ </a>
40
+ <a title="License" href="https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
41
+ <img src="https://img.shields.io/badge/License-OpenRAIL++-929292" alt="License">
42
+ </a>
43
+ </p>
44
 
45
+ This is a model card for the `marigold-disparity-affine-v0-1` model for monocular depth estimation from a single image.
46
+ The model is fine-tuned from the `stable-diffusion-2` [model](https://huggingface.co/stabilityai/stable-diffusion-2) as
47
+ described in our papers, in inverse depth (disparity) space:
48
+ - [CVPR'2024 paper](https://hf.co/papers/2312.02145) titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation"
49
+ - [Journal extension](https://hf.co/papers/2505.09358) titled "Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis"
50
 
51
+ ### Using the model
52
+ - Play with the interactive [Hugging Face Spaces demo](https://huggingface.co/spaces/prs-eth/marigold): check out how the model works with example images or upload your own.
53
+ - Use it with [diffusers](https://huggingface.co/docs/diffusers/using-diffusers/marigold_usage) to compute the results with a few lines of code.
54
+ - Get to the bottom of things with our [official codebase](https://github.com/prs-eth/marigold).
55
 
56
+ ## Model Details
57
+ - **Developed by:** [Bingxin Ke](http://www.kebingxin.com/), [Kevin Qu](https://ch.linkedin.com/in/kevin-qu-b3417621b), [Tianfu Wang](https://tianfwang.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Shengyu Huang](https://shengyuh.github.io/), [Bo Li](https://www.linkedin.com/in/bobboli0202), [Anton Obukhov](https://www.obukhov.ai/), [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ).
58
+ - **Model type:** Generative latent diffusion-based affine-invariant monocular depth estimation from a single image.
59
+ - **Language:** English.
60
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL).
61
+ - **Model Description:** This model can be used to generate an estimated depth map of an input image.
62
+ - **Resolution**: Even though any resolution can be processed, the model inherits the base diffusion model's effective resolution of roughly **768** pixels.
63
+ This means that for optimal predictions, any larger input image should be resized to make the longer side 768 pixels before feeding it into the model.
64
+ - **Steps and scheduler**: This model was designed for usage with the **DDIM** scheduler and between **1 and 50** denoising steps.
65
+ - **Outputs**:
66
+ - **Affine-invariant depth map**: The predicted values are between 0 and 1, interpolating between the near and far planes of the model's choice.
67
+ - **Uncertainty map**: Produced only when multiple predictions are ensembled with ensemble size larger than 2.
68
+ - **Resources for more information:** [Project Website](https://marigoldcomputervision.github.io/), [Paper](https://arxiv.org/abs/2505.09358), [Code](https://github.com/prs-eth/marigold).
69
+ - **Cite as:**
70
 
71
  ```bibtex
72
+ @misc{ke2025marigold,
73
+ title={Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis},
74
+ author={Bingxin Ke and Kevin Qu and Tianfu Wang and Nando Metzger and Shengyu Huang and Bo Li and Anton Obukhov and Konrad Schindler},
75
+ year={2025},
76
+ eprint={2505.09358},
77
+ archivePrefix={arXiv},
78
+ primaryClass={cs.CV}
79
  }
 
 
 
 
 
80
 
81
+ @InProceedings{ke2023repurposing,
82
+ title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
83
+ author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
84
+ booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
85
+ year={2024}
86
+ }
87
+ ```