Borcherding commited on
Commit
a0bc766
·
verified ·
1 Parent(s): b6801da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +246 -17
README.md CHANGED
@@ -7,37 +7,266 @@ tags:
7
 
8
  # Depth2Robot GAN Model
9
 
10
- This model transforms depth maps into robot-style images, and also transforms robot-style images into estimated depth maps using CycleGAN.
 
11
  <div style="display: flex; flex-wrap: wrap; justify-content: center;">
12
  <div style="display: flex; width: 100%; justify-content: center; margin-bottom: 10px;">
13
- <img src="testOutput/depth2image/custom_real.png" alt="depth map" title="depth map" width="45%">
14
- <img src="testOutput/depth2image/custom_fake.png" alt="stylized depth map" title="stylized depth map" width="45%">
15
  </div>
16
  <div style="display: flex; width: 100%; justify-content: center;">
17
- <img src="testOutput/image2depth/custom_real.png" alt="depth map" title="depth map" width="45%">
18
- <img src="testOutput/image2depth/custom_fake.png" alt="stylized depth map" title="stylized depth map" width="45%">
19
  </div>
20
  </div>
21
- # Model Description
22
 
23
- - This model was trained on depth maps and robot images.
24
- - It converts grayscale depth maps to colorful robot-style imagery.
25
- - Trained using CycleGAN architecture.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- ## Usage
 
 
 
 
 
 
 
28
 
29
  ```python
30
  import torch
 
 
 
31
  from huggingface_hub import hf_hub_download
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  # Download the model
34
- model_path = hf_hub_download(repo_id="Borcherding/depth2AnythingCycleGAN_RobotsV2", filename="latest_net_G.pth")
 
 
 
 
 
 
 
 
 
 
35
 
36
- # Load the model (you need to define the Generator class)
37
- model = Generator()
38
- model.load_state_dict(torch.load(model_path), strict=False)
39
- model.eval()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- # Use the model for inference
42
- # ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  # Depth2Robot GAN Model
9
 
10
+ This model transforms depth maps into robot-style images, and also transforms robot-style images into estimated depth maps using CycleGAN architecture.
11
+
12
  <div style="display: flex; flex-wrap: wrap; justify-content: center;">
13
  <div style="display: flex; width: 100%; justify-content: center; margin-bottom: 10px;">
14
+ <img src="testOutput/depth2image/custom_real.png" alt="depth map" title="Depth Map (Input)" width="45%">
15
+ <img src="testOutput/depth2image/custom_fake.png" alt="robot-style image" title="Robot-Style Image (Output)" width="45%">
16
  </div>
17
  <div style="display: flex; width: 100%; justify-content: center;">
18
+ <img src="testOutput/image2depth/custom_real.png" alt="robot-style image" title="Robot-Style Image (Input)" width="45%">
19
+ <img src="testOutput/image2depth/custom_fake.png" alt="depth map" title="Depth Map (Output)" width="45%">
20
  </div>
21
  </div>
 
22
 
23
+ ## Model Description
24
+
25
+ - This model was trained on depth maps and robot images using CycleGAN architecture
26
+ - It supports bidirectional transformation:
27
+ - Depth map → Robot-style imagery
28
+ - Robot-style imagery → Depth map
29
+ - The model uses a ResNet-based generator with residual blocks
30
+
31
+ ## Installation
32
+
33
+ ```bash
34
+ # Clone the repository
35
+ git clone https://github.com/yourusername/depth2robot
36
+ cd depth2robot
37
+
38
+ # Install dependencies
39
+ pip install torch torchvision gradio pyvirtualcam
40
+ ```
41
+
42
+ ## Usage Options
43
+
44
+ ### Option 1: Simple Test Interface
45
+
46
+ Run the simple test interface to quickly try out the model:
47
+
48
+ ```bash
49
+ python cycleGANtest.py
50
+ ```
51
+
52
+ This launches a Gradio interface where you can:
53
+ - Upload an image
54
+ - Select conversion direction (Depth to Image or Image to Depth)
55
+ - Transform the image with a single click
56
+
57
+ ### Option 2: Webcam Integration with Depth Estimation
58
+
59
+ For a more advanced setup that includes real-time webcam processing with Depth Anything V2:
60
+
61
+ ```bash
62
+ # Set the path to Depth Anything V2
63
+ export DEPTH_ANYTHING_V2_PATH=/path/to/depth-anything-v2
64
+
65
+ # Run the integrated application
66
+ python integrated-depth-cyclegan.py
67
+ ```
68
 
69
+ This launches a Gradio interface that allows you to:
70
+ - Capture webcam input
71
+ - Generate depth maps using Depth Anything V2
72
+ - Apply winter-themed colormap to depth maps
73
+ - Apply CycleGAN transformation in either direction
74
+ - Output to a virtual camera for use in video conferencing or streaming
75
+
76
+ ## Using the Model Programmatically
77
 
78
  ```python
79
  import torch
80
+ import numpy as np
81
+ import torchvision.transforms as transforms
82
+ from PIL import Image
83
  from huggingface_hub import hf_hub_download
84
 
85
+ # Define the Generator architecture (as shown in the provided code)
86
+ class ResidualBlock(nn.Module):
87
+ def __init__(self, channels):
88
+ super(ResidualBlock, self).__init__()
89
+ self.conv_block = nn.Sequential(
90
+ nn.ReflectionPad2d(1),
91
+ nn.Conv2d(channels, channels, 3),
92
+ nn.InstanceNorm2d(channels),
93
+ nn.ReLU(inplace=True),
94
+ nn.ReflectionPad2d(1),
95
+ nn.Conv2d(channels, channels, 3),
96
+ nn.InstanceNorm2d(channels)
97
+ )
98
+
99
+ def forward(self, x):
100
+ return x + self.conv_block(x)
101
+
102
+ class Generator(nn.Module):
103
+ def __init__(self, input_channels=3, output_channels=3, n_residual_blocks=9):
104
+ super(Generator, self).__init__()
105
+
106
+ # Initial convolution
107
+ model = [
108
+ nn.ReflectionPad2d(3),
109
+ nn.Conv2d(input_channels, 64, 7),
110
+ nn.InstanceNorm2d(64),
111
+ nn.ReLU(inplace=True)
112
+ ]
113
+
114
+ # Downsampling
115
+ in_features = 64
116
+ out_features = in_features * 2
117
+ for _ in range(2):
118
+ model += [
119
+ nn.Conv2d(in_features, out_features, 3, stride=2, padding=1),
120
+ nn.InstanceNorm2d(out_features),
121
+ nn.ReLU(inplace=True)
122
+ ]
123
+ in_features = out_features
124
+ out_features = in_features * 2
125
+
126
+ # Residual blocks
127
+ for _ in range(n_residual_blocks):
128
+ model += [ResidualBlock(in_features)]
129
+
130
+ # Upsampling
131
+ out_features = in_features // 2
132
+ for _ in range(2):
133
+ model += [
134
+ nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1),
135
+ nn.InstanceNorm2d(out_features),
136
+ nn.ReLU(inplace=True)
137
+ ]
138
+ in_features = out_features
139
+ out_features = in_features // 2
140
+
141
+ # Output layer
142
+ model += [
143
+ nn.ReflectionPad2d(3),
144
+ nn.Conv2d(64, output_channels, 7),
145
+ nn.Tanh()
146
+ ]
147
+
148
+ self.model = nn.Sequential(*model)
149
+
150
+ def forward(self, x):
151
+ return self.model(x)
152
+
153
  # Download the model
154
+ def download_model(direction="depth2image"):
155
+ if direction == "depth2image":
156
+ filename = "latest_net_G_A.pth"
157
+ else: # "image2depth"
158
+ filename = "latest_net_G_B.pth"
159
+
160
+ model_path = hf_hub_download(
161
+ repo_id="Borcherding/depth2AnythingCycleGAN_RobotsV2",
162
+ filename=filename
163
+ )
164
+ return model_path
165
 
166
+ # Image preprocessing
167
+ def preprocess_image(image):
168
+ """
169
+ Preprocess image for model input
170
+
171
+ Args:
172
+ image: PIL Image or numpy array
173
+
174
+ Returns:
175
+ torch.Tensor: Normalized tensor ready for model input
176
+ """
177
+ if isinstance(image, np.ndarray):
178
+ image = Image.fromarray(image.astype('uint8'), 'RGB')
179
+
180
+ transform = transforms.Compose([
181
+ transforms.Resize(256),
182
+ transforms.ToTensor(),
183
+ transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
184
+ ])
185
+
186
+ return transform(image).unsqueeze(0)
187
 
188
+ # Image postprocessing
189
+ def postprocess_image(tensor):
190
+ """
191
+ Convert model output tensor to numpy image
192
+
193
+ Args:
194
+ tensor: Model output tensor
195
+
196
+ Returns:
197
+ numpy.ndarray: RGB image array (0-255)
198
+ """
199
+ tensor = tensor.squeeze(0).cpu()
200
+ tensor = (tensor + 1) / 2
201
+ tensor = tensor.clamp(0, 1)
202
+ tensor = tensor.permute(1, 2, 0).numpy()
203
+ return (tensor * 255).astype(np.uint8)
204
+
205
+ # Example usage
206
+ def transform_image(input_image_path, direction="depth2image"):
207
+ """
208
+ Transform an image using the Depth2Robot model
209
+
210
+ Args:
211
+ input_image_path: Path to input image
212
+ direction: "depth2image" or "image2depth"
213
+
214
+ Returns:
215
+ numpy.ndarray: Transformed image
216
+ """
217
+ # Load model
218
+ model_path = download_model(direction)
219
+ model = Generator()
220
+ model.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False)
221
+ model.eval()
222
+
223
+ # Load and preprocess image
224
+ input_image = Image.open(input_image_path).convert('RGB')
225
+ input_tensor = preprocess_image(input_image)
226
+
227
+ # Generate output
228
+ with torch.no_grad():
229
+ output_tensor = model(input_tensor)
230
+
231
+ # Postprocess output
232
+ output_image = postprocess_image(output_tensor)
233
+
234
+ return output_image
235
  ```
236
+
237
+ ## Model Checkpoints
238
+
239
+ The model checkpoints are available on Hugging Face:
240
+ - Repository: [Borcherding/depth2AnythingCycleGAN_RobotsV2](https://huggingface.co/Borcherding/depth2AnythingCycleGAN_RobotsV2)
241
+ - Files:
242
+ - `latest_net_G_A.pth` - Generator for Depth to Robot Image transformation
243
+ - `latest_net_G_B.pth` - Generator for Robot Image to Depth transformation
244
+
245
+ ## Integration with Depth Anything V2
246
+
247
+ The integrated application (`integrated-depth-cyclegan.py`) also leverages [Depth Anything V2](https://github.com/depth-anything/Depth-Anything-V2) for real-time depth estimation, providing a complete pipeline:
248
+
249
+ 1. Capture webcam input
250
+ 2. Generate depth maps with Depth Anything V2
251
+ 3. Apply CycleGAN transformation
252
+ 4. Output to virtual camera
253
+
254
+ ## Requirements
255
+
256
+ - Python 3.7+
257
+ - PyTorch 1.7+
258
+ - torchvision
259
+ - gradio
260
+ - pyvirtualcam (for webcam integration)
261
+ - OpenCV (cv2)
262
+ - Depth Anything V2 (for integrated application)
263
+
264
+ ## License
265
+
266
+ [Insert your license information here]
267
+
268
+ ## Acknowledgments
269
+
270
+ - This model uses CycleGAN architecture from the paper [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) by Zhu et al.
271
+ - The implementation is based on [junyanz/pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)
272
+ - Integrated application leverages Depth Anything V2 for depth estimation