toshas commited on
Commit
1619d3a
·
0 Parent(s):

Initial commit

Browse files
.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.npy filter=lfs diff=lfs merge=lfs -text
37
+ *.jpg filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .idea
2
+ .DS_Store
3
+ __pycache__
LICENSE.txt ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Marigold Depth Completion
3
+ emoji: 🏵️
4
+ colorFrom: blue
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
+ pinned: true
10
+ license: apache-2.0
11
+ models:
12
+ - prs-eth/marigold-v1-0
13
+ ---
14
+
15
+ This is a demo of the monocular depth completion pipeline, based on the CVPR 2024 paper titled ["Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation"](https://arxiv.org/abs/2312.02145)
16
+
17
+ ```
18
+ @InProceedings{ke2023repurposing,
19
+ title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
20
+ author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
21
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
22
+ year={2024}
23
+ }
24
+ ```
app.py ADDED
@@ -0,0 +1,340 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Anton Obukhov, ETH Zurich. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ # --------------------------------------------------------------------------
15
+ # If you find this code useful, we kindly ask you to cite our paper in your work.
16
+ # Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
17
+ # More information about the method can be found at https://marigoldmonodepth.github.io
18
+ # --------------------------------------------------------------------------
19
+
20
+ import functools
21
+ import os
22
+
23
+ import spaces
24
+ import gradio as gr
25
+ import numpy as np
26
+ import plotly.graph_objects as go
27
+ import torch as torch
28
+ from PIL import Image
29
+ from scipy.ndimage import maximum_filter
30
+
31
+ from marigold_dc import MarigoldDepthCompletionPipeline
32
+
33
+ from gradio_imageslider import ImageSlider
34
+ from huggingface_hub import login
35
+
36
+ DRY_RUN = False
37
+
38
+
39
+ def dilate_rgb_image(image, kernel_size):
40
+ r_channel, g_channel, b_channel = image[..., 0], image[..., 1], image[..., 2]
41
+ r_dilated = maximum_filter(r_channel, size=kernel_size)
42
+ g_dilated = maximum_filter(g_channel, size=kernel_size)
43
+ b_dilated = maximum_filter(b_channel, size=kernel_size)
44
+ dilated_image = np.stack([r_dilated, g_dilated, b_dilated], axis=-1)
45
+ return dilated_image
46
+
47
+
48
+ def generate_rmse_plot(steps, metrics, denoise_steps):
49
+ y_min = min(metrics)
50
+ y_max = max(metrics)
51
+ fig = go.Figure()
52
+ fig.add_trace(
53
+ go.Scatter(
54
+ x=steps,
55
+ y=metrics,
56
+ mode="lines+markers",
57
+ line=dict(color="#af2928"),
58
+ name="RMSE",
59
+ )
60
+ )
61
+
62
+ if denoise_steps < 20:
63
+ x_dtick = 1
64
+ else:
65
+ x_dtick = 5
66
+
67
+ fig.update_layout(
68
+ autosize=False,
69
+ height=300,
70
+ xaxis_title="Steps",
71
+ xaxis_range=[0, denoise_steps + 1],
72
+ xaxis=dict(
73
+ scaleanchor="y",
74
+ scaleratio=1.5,
75
+ dtick=x_dtick,
76
+ ),
77
+ yaxis_title="RMSE",
78
+ yaxis_range=[np.log10(max(y_min - 0.1, 0.1)), np.log10(y_max + 1)],
79
+ yaxis=dict(
80
+ type="log",
81
+ ),
82
+ hovermode="x unified",
83
+ template="plotly_white",
84
+ )
85
+ return fig
86
+
87
+
88
+ def process(
89
+ pipe,
90
+ path_image,
91
+ path_sparse,
92
+ denoise_steps,
93
+ ):
94
+ image = Image.open(path_image)
95
+ sparse_depth = np.load(path_sparse)
96
+ sparse_depth_valid = sparse_depth[sparse_depth > 0]
97
+ sparse_depth_min = np.min(sparse_depth_valid)
98
+ sparse_depth_max = np.max(sparse_depth_valid)
99
+ width, height = image.size
100
+ max_dim = max(width, height)
101
+
102
+ processing_resolution = 0
103
+ if max_dim > 768:
104
+ processing_resolution = 768
105
+
106
+ metrics = []
107
+ steps = []
108
+
109
+ for step, (pred, rmse) in enumerate(
110
+ pipe(
111
+ image=Image.open(path_image),
112
+ sparse_depth=sparse_depth,
113
+ num_inference_steps=denoise_steps + 1,
114
+ processing_resolution=processing_resolution,
115
+ dry_run=DRY_RUN,
116
+ )
117
+ ):
118
+ min_both = min(sparse_depth_min, pred.min().item())
119
+ max_both = min(sparse_depth_max, pred.max().item())
120
+ metrics.append(rmse)
121
+ steps.append(step)
122
+
123
+ vis_pred = pipe.image_processor.visualize_depth(
124
+ pred, val_min=min_both, val_max=max_both
125
+ )[0]
126
+
127
+ vis_sparse = pipe.image_processor.visualize_depth(
128
+ sparse_depth, val_min=min_both, val_max=max_both
129
+ )[0]
130
+ vis_sparse = np.array(vis_sparse)
131
+ vis_sparse[sparse_depth <= 0] = (0, 0, 0)
132
+ vis_sparse = dilate_rgb_image(vis_sparse, kernel_size=5)
133
+ vis_sparse = Image.fromarray(vis_sparse)
134
+
135
+ plot = generate_rmse_plot(steps, metrics, denoise_steps)
136
+
137
+ yield (
138
+ [vis_sparse, vis_pred],
139
+ plot,
140
+ )
141
+
142
+
143
+ def run_demo_server(pipe):
144
+ process_pipe = spaces.GPU(functools.partial(process, pipe))
145
+ os.environ["GRADIO_ALLOW_FLAGGING"] = "never"
146
+
147
+ with gr.Blocks(
148
+ analytics_enabled=False,
149
+ title="Marigold Depth Completion",
150
+ css="""
151
+ #short {
152
+ height: 130px;
153
+ }
154
+ .slider .inner {
155
+ width: 4px;
156
+ background: #FFF;
157
+ }
158
+ .slider .icon-wrap svg {
159
+ fill: #FFF;
160
+ stroke: #FFF;
161
+ stroke-width: 3px;
162
+ }
163
+ .viewport {
164
+ aspect-ratio: 4/3;
165
+ }
166
+ h1 {
167
+ text-align: center;
168
+ display: block;
169
+ }
170
+ h2 {
171
+ text-align: center;
172
+ display: block;
173
+ }
174
+ h3 {
175
+ text-align: center;
176
+ display: block;
177
+ }
178
+ """,
179
+ ) as demo:
180
+ gr.HTML(
181
+ """
182
+ <h1>⇆ Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion</h1>
183
+ <p align="center">
184
+ <a title="Website" href="https://MarigoldDepthCompletion.github.io/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
185
+ <img src="https://img.shields.io/badge/%F0%9F%A4%8D%20Project%20-Website-blue" alt="Website Badge">
186
+ </a>
187
+ <a title="arXiv" href="https://arxiv.org/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
188
+ <img src="https://img.shields.io/badge/%F0%9F%93%84%20Read%20-Paper-af2928" alt="arXiv Badge">
189
+ </a>
190
+ <a title="Github" href="https://github.com/prs-eth/marigold-dc" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
191
+ <img src="https://img.shields.io/github/stars/prs-eth/marigold-dc?label=GitHub&logo=github&color=C8C" alt="badge-github-stars">
192
+ </a>
193
+ <a title="Social" href="https://twitter.com/antonobukhov1" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
194
+ <img src="https://www.obukhov.ai/img/badges/badge-social.svg" alt="social">
195
+ </a><br>
196
+ Start exploring the interactive examples at the bottom of the page!
197
+ </p>
198
+ """
199
+ )
200
+
201
+ with gr.Row():
202
+ with gr.Column():
203
+ input_image = gr.Image(
204
+ label="Input Image",
205
+ type="filepath",
206
+ )
207
+ input_sparse = gr.File(
208
+ label="Input sparse depth (numpy file)",
209
+ elem_id="short",
210
+ )
211
+ with gr.Accordion("Advanced options", open=False):
212
+ denoise_steps = gr.Slider(
213
+ label="Number of denoising steps",
214
+ minimum=10,
215
+ maximum=50,
216
+ step=1,
217
+ value=10,
218
+ )
219
+ with gr.Row():
220
+ submit_btn = gr.Button(value="Compute Depth", variant="primary")
221
+ clear_btn = gr.Button(value="Clear")
222
+ with gr.Column():
223
+ output_slider = ImageSlider(
224
+ label="Completed depth (red-near, blue-far)",
225
+ type="filepath",
226
+ show_download_button=True,
227
+ show_share_button=True,
228
+ interactive=False,
229
+ elem_classes="slider",
230
+ position=0.25,
231
+ )
232
+ plot = gr.Plot(
233
+ label="RMSE between input and result",
234
+ elem_id="viewport",
235
+ )
236
+
237
+ inputs = [
238
+ input_image,
239
+ input_sparse,
240
+ denoise_steps,
241
+ ]
242
+ outputs = [
243
+ output_slider,
244
+ plot,
245
+ ]
246
+
247
+ def submit_depth_fn(path_image, path_sparse, denoise_steps):
248
+ for outputs in process_pipe(path_image, path_sparse, denoise_steps):
249
+ yield outputs
250
+
251
+ submit_btn.click(
252
+ fn=submit_depth_fn,
253
+ inputs=inputs,
254
+ outputs=outputs,
255
+ )
256
+
257
+ gr.Examples(
258
+ fn=submit_depth_fn,
259
+ examples=[
260
+ [
261
+ "files/kitti_1.png",
262
+ "files/kitti_1.npy",
263
+ 10, # denoise_steps
264
+ ],
265
+ [
266
+ "files/kitti_2.png",
267
+ "files/kitti_2.npy",
268
+ 10, # denoise_steps
269
+ ],
270
+ [
271
+ "files/teaser.png",
272
+ "files/teaser_1000.npy",
273
+ 10, # denoise_steps
274
+ ],
275
+ [
276
+ "files/teaser.png",
277
+ "files/teaser_100.npy",
278
+ 10, # denoise_steps
279
+ ],
280
+ [
281
+ "files/teaser.png",
282
+ "files/teaser_10.npy",
283
+ 10, # denoise_steps
284
+ ],
285
+ ],
286
+ inputs=inputs,
287
+ outputs=outputs,
288
+ cache_examples="lazy",
289
+ )
290
+
291
+ def clear_fn():
292
+ return [
293
+ gr.Image(value=None, interactive=True),
294
+ gr.File(None, interactive=True),
295
+ None,
296
+ ]
297
+
298
+ clear_btn.click(
299
+ fn=clear_fn,
300
+ inputs=[],
301
+ outputs=[
302
+ input_image,
303
+ input_sparse,
304
+ output_slider,
305
+ ],
306
+ )
307
+
308
+ demo.queue(
309
+ api_open=False,
310
+ ).launch(
311
+ server_name="0.0.0.0",
312
+ server_port=7860,
313
+ )
314
+
315
+
316
+ def main():
317
+ CHECKPOINT = "prs-eth/marigold-depth-v1-0"
318
+
319
+ os.system("pip freeze")
320
+
321
+ if "HF_TOKEN_LOGIN" in os.environ:
322
+ login(token=os.environ["HF_TOKEN_LOGIN"])
323
+
324
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
325
+
326
+ pipe = MarigoldDepthCompletionPipeline.from_pretrained(CHECKPOINT)
327
+
328
+ try:
329
+ import xformers
330
+
331
+ pipe.enable_xformers_memory_efficient_attention()
332
+ except:
333
+ pass # run without xformers
334
+
335
+ pipe = pipe.to(device)
336
+ run_demo_server(pipe)
337
+
338
+
339
+ if __name__ == "__main__":
340
+ main()
files/kitti_1.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7700e39fa4ccacd974ba2d76c3c4d94016e266f1cb99153a9d7ba89b4d46962
3
+ size 3424384
files/kitti_1.png ADDED

Git LFS Details

  • SHA256: fde3b58a9c1dfde2dbeb464535df195880c972da1619dc00eaa7fe74fd0784ee
  • Pointer size: 131 Bytes
  • Size of remote file: 728 kB
files/kitti_2.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a26a4c670640071c599e068f9b932e22a261150f3ecda1e46827751629c925f
3
+ size 3424384
files/kitti_2.png ADDED

Git LFS Details

  • SHA256: 9e93cd1517f28597e0f2726d52f6054c31288a2d67ae0dfaf960db3605843215
  • Pointer size: 131 Bytes
  • Size of remote file: 694 kB
files/teaser.png ADDED

Git LFS Details

  • SHA256: 6218bd424d631e3f3e22905c900049f6b770e9a18e2562716fc4ad880af939f4
  • Pointer size: 131 Bytes
  • Size of remote file: 521 kB
files/teaser_10.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32e88cc8bf7a332d656e7c21996f0fc382072eb6a5a192fc6b03fa199842a65e
3
+ size 2457728
files/teaser_100.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44bf100a969b99061d597850eb0ed039b1cf79a61f9b9aea40e51fff632a6743
3
+ size 2457728
files/teaser_1000.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86c7ef075046d10dd5edee50cca19472b9a268b778a1b1dd01d4474f01b1f3d3
3
+ size 2457728
marigold_dc.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import warnings
3
+
4
+ import diffusers
5
+ import numpy as np
6
+ import torch
7
+ from diffusers import MarigoldDepthPipeline
8
+
9
+ warnings.simplefilter(action="ignore", category=FutureWarning)
10
+ diffusers.utils.logging.disable_progress_bar()
11
+
12
+
13
+ class MarigoldDepthCompletionPipeline(MarigoldDepthPipeline):
14
+ def __call__(
15
+ self,
16
+ image,
17
+ sparse_depth,
18
+ num_inference_steps=50,
19
+ processing_resolution=0,
20
+ seed=2024,
21
+ dry_run=False,
22
+ ):
23
+ # Resolving variables
24
+ device = self._execution_device
25
+ generator = torch.Generator(device=device).manual_seed(seed)
26
+
27
+ if dry_run:
28
+ logging.warning("Dry run mode")
29
+ for i in range(num_inference_steps):
30
+ yield np.array(image)[:, :, 0].astype(float), float(np.log(i + 1))
31
+ return
32
+
33
+ # Check inputs.
34
+ if num_inference_steps is None:
35
+ raise ValueError("Invalid num_inference_steps")
36
+ if type(sparse_depth) is not np.ndarray or sparse_depth.ndim != 2:
37
+ raise ValueError(
38
+ "Sparse depth should be a 2D numpy ndarray with zeros at missing positions"
39
+ )
40
+
41
+ with torch.no_grad():
42
+ # Prepare empty text conditioning
43
+ if self.empty_text_embedding is None:
44
+ prompt = ""
45
+ text_inputs = self.tokenizer(
46
+ prompt,
47
+ padding="do_not_pad",
48
+ max_length=self.tokenizer.model_max_length,
49
+ truncation=True,
50
+ return_tensors="pt",
51
+ )
52
+ text_input_ids = text_inputs.input_ids.to(device)
53
+ self.empty_text_embedding = self.text_encoder(text_input_ids)[
54
+ 0
55
+ ] # [1,2,1024]
56
+
57
+ # Preprocess input images
58
+ image, padding, original_resolution = self.image_processor.preprocess(
59
+ image,
60
+ processing_resolution=processing_resolution,
61
+ device=device,
62
+ dtype=self.dtype,
63
+ ) # [N,3,PPH,PPW]
64
+
65
+ if sparse_depth.shape != original_resolution:
66
+ raise ValueError(
67
+ f"Sparse depth dimensions ({sparse_depth.shape}) must match that of the image ({image.shape[-2:]})"
68
+ )
69
+ with torch.no_grad():
70
+ # Encode input image into latent space
71
+ image_latent, pred_latent = self.prepare_latents(
72
+ image, None, generator, 1, 1
73
+ ) # [N*E,4,h,w], [N*E,4,h,w]
74
+ del image
75
+
76
+ # Preprocess sparse depth
77
+ sparse_depth = torch.from_numpy(sparse_depth)[None, None].float()
78
+ sparse_depth = sparse_depth.to(device)
79
+ sparse_mask = sparse_depth > 0
80
+
81
+ # Set up optimization targets
82
+
83
+ scale = torch.nn.Parameter(torch.ones(1, device=device), requires_grad=True)
84
+ shift = torch.nn.Parameter(torch.ones(1, device=device), requires_grad=True)
85
+ pred_latent = torch.nn.Parameter(pred_latent, requires_grad=True)
86
+
87
+ sparse_range = (
88
+ sparse_depth[sparse_mask].max() - sparse_depth[sparse_mask].min()
89
+ ).item()
90
+ sparse_lower = (sparse_depth[sparse_mask].min()).item()
91
+
92
+ def affine_to_metric(depth):
93
+ return (scale**2) * sparse_range * depth + (shift**2) * sparse_lower
94
+
95
+ def latent_to_metric(latent):
96
+ affine_invariant_prediction = self.decode_prediction(
97
+ latent
98
+ ) # [E,1,PPH,PPW]
99
+ prediction = affine_to_metric(affine_invariant_prediction)
100
+ prediction = self.image_processor.unpad_image(
101
+ prediction, padding
102
+ ) # [E,1,PH,PW]
103
+ prediction = self.image_processor.resize_antialias(
104
+ prediction, original_resolution, "bilinear", is_aa=False
105
+ ) # [1,1,H,W]
106
+ return prediction
107
+
108
+ def loss_l1l2(input, target):
109
+ out_l1 = torch.nn.functional.l1_loss(input, target)
110
+ out_l2 = torch.nn.functional.mse_loss(input, target)
111
+ out = out_l1 + out_l2
112
+ return out, out_l2.sqrt()
113
+
114
+ optimizer = torch.optim.Adam(
115
+ [
116
+ {"params": [scale, shift], "lr": 0.005},
117
+ {"params": [pred_latent], "lr": 0.05},
118
+ ]
119
+ )
120
+
121
+ # Process the denoising loop
122
+ self.scheduler.set_timesteps(num_inference_steps, device=device)
123
+ for iter, t in enumerate(
124
+ self.progress_bar(
125
+ self.scheduler.timesteps, desc=f"Marigold-DC steps ({str(device)})..."
126
+ )
127
+ ):
128
+ optimizer.zero_grad()
129
+
130
+ batch_latent = torch.cat([image_latent, pred_latent], dim=1) # [1,8,h,w]
131
+ noise = self.unet(
132
+ batch_latent,
133
+ t,
134
+ encoder_hidden_states=self.empty_text_embedding,
135
+ return_dict=False,
136
+ )[
137
+ 0
138
+ ] # [1,4,h,w]
139
+
140
+ # Compute pred_epsilon to later rescale the depth latent gradient
141
+ with torch.no_grad():
142
+ alpha_prod_t = self.scheduler.alphas_cumprod[t]
143
+ beta_prod_t = 1 - alpha_prod_t
144
+ pred_epsilon = (alpha_prod_t**0.5) * noise + (
145
+ beta_prod_t**0.5
146
+ ) * pred_latent
147
+
148
+ step_output = self.scheduler.step(
149
+ noise, t, pred_latent, generator=generator
150
+ )
151
+
152
+ # Preview the final output depth, compute loss with guidance, backprop
153
+ pred_original_sample = step_output.pred_original_sample
154
+ current_metric_estimate = latent_to_metric(pred_original_sample)
155
+ loss, rmse = loss_l1l2(
156
+ current_metric_estimate[sparse_mask], sparse_depth[sparse_mask]
157
+ )
158
+ loss.backward()
159
+
160
+ # Scale gradients up
161
+ with torch.no_grad():
162
+ pred_epsilon_norm = torch.linalg.norm(pred_epsilon).item()
163
+ depth_latent_grad_norm = torch.linalg.norm(pred_latent.grad).item()
164
+ scaling_factor = pred_epsilon_norm / max(depth_latent_grad_norm, 1e-8)
165
+ pred_latent.grad *= scaling_factor
166
+
167
+ optimizer.step()
168
+
169
+ with torch.no_grad():
170
+ pred_latent.data = self.scheduler.step(
171
+ noise, t, pred_latent, generator=generator
172
+ ).prev_sample
173
+
174
+ yield current_metric_estimate, rmse.item()
175
+
176
+ del (
177
+ pred_original_sample,
178
+ current_metric_estimate,
179
+ step_output,
180
+ pred_epsilon,
181
+ noise,
182
+ )
183
+ torch.cuda.empty_cache()
184
+
185
+ # Offload all models
186
+ self.maybe_free_model_hooks()
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diffusers==0.31.0
2
+ gradio==4.44.1
3
+ gradio-imageslider==0.0.20
4
+ accelerate
5
+ matplotlib
6
+ numpy
7
+ pillow
8
+ plotly
9
+ scipy
10
+ spaces
11
+ torch
12
+ transformers
13
+ xformers
14
+ pandas