Borcherding
/

CycleGAN_Depth2RobotsV2_Blend

Image-to-Image

depth-to-robot

cyclegan

depth-to-anything

Model card Files Files and versions Community

Borcherding commited on Mar 6

Commit

a0bc766

verified ·

1 Parent(s): b6801da

Update README.md

Browse files

Files changed (1) hide show

README.md +246 -17

README.md CHANGED Viewed

@@ -7,37 +7,266 @@ tags:
 # Depth2Robot GAN Model
-This model transforms depth maps into robot-style images, and also transforms robot-style images into estimated depth maps using CycleGAN.
 <div style="display: flex; flex-wrap: wrap; justify-content: center;">
   <div style="display: flex; width: 100%; justify-content: center; margin-bottom: 10px;">
-    <img src="testOutput/depth2image/custom_real.png" alt="depth map" title="depth map" width="45%">
-    <img src="testOutput/depth2image/custom_fake.png" alt="stylized depth map" title="stylized depth map" width="45%">
   </div>
   <div style="display: flex; width: 100%; justify-content: center;">
-    <img src="testOutput/image2depth/custom_real.png" alt="depth map" title="depth map" width="45%">
-    <img src="testOutput/image2depth/custom_fake.png" alt="stylized depth map" title="stylized depth map" width="45%">
   </div>
 </div>
-# Model Description
-- This model was trained on depth maps and robot images.
-- It converts grayscale depth maps to colorful robot-style imagery.
-- Trained using CycleGAN architecture.
-## Usage
 ```python
 import torch
 from huggingface_hub import hf_hub_download
 # Download the model
-model_path = hf_hub_download(repo_id="Borcherding/depth2AnythingCycleGAN_RobotsV2", filename="latest_net_G.pth")
-# Load the model (you need to define the Generator class)
-model = Generator()
-model.load_state_dict(torch.load(model_path), strict=False)
-model.eval()
-# Use the model for inference
-# ...
 ```

 # Depth2Robot GAN Model
+This model transforms depth maps into robot-style images, and also transforms robot-style images into estimated depth maps using CycleGAN architecture.
 <div style="display: flex; flex-wrap: wrap; justify-content: center;">
   <div style="display: flex; width: 100%; justify-content: center; margin-bottom: 10px;">
+    <img src="testOutput/depth2image/custom_real.png" alt="depth map" title="Depth Map (Input)" width="45%">
+    <img src="testOutput/depth2image/custom_fake.png" alt="robot-style image" title="Robot-Style Image (Output)" width="45%">
   </div>
   <div style="display: flex; width: 100%; justify-content: center;">
+    <img src="testOutput/image2depth/custom_real.png" alt="robot-style image" title="Robot-Style Image (Input)" width="45%">
+    <img src="testOutput/image2depth/custom_fake.png" alt="depth map" title="Depth Map (Output)" width="45%">
   </div>
 </div>
+## Model Description
+- This model was trained on depth maps and robot images using CycleGAN architecture
+- It supports bidirectional transformation:
+  - Depth map → Robot-style imagery
+  - Robot-style imagery → Depth map
+- The model uses a ResNet-based generator with residual blocks
+## Installation
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/depth2robot
+cd depth2robot
+# Install dependencies
+pip install torch torchvision gradio pyvirtualcam
+```
+## Usage Options
+### Option 1: Simple Test Interface
+Run the simple test interface to quickly try out the model:
+```bash
+python cycleGANtest.py
+```
+This launches a Gradio interface where you can:
+- Upload an image
+- Select conversion direction (Depth to Image or Image to Depth)
+- Transform the image with a single click
+### Option 2: Webcam Integration with Depth Estimation
+For a more advanced setup that includes real-time webcam processing with Depth Anything V2:
+```bash
+# Set the path to Depth Anything V2
+export DEPTH_ANYTHING_V2_PATH=/path/to/depth-anything-v2
+# Run the integrated application
+python integrated-depth-cyclegan.py
+```
+This launches a Gradio interface that allows you to:
+- Capture webcam input
+- Generate depth maps using Depth Anything V2
+- Apply winter-themed colormap to depth maps
+- Apply CycleGAN transformation in either direction
+- Output to a virtual camera for use in video conferencing or streaming
+## Using the Model Programmatically
 ```python
 import torch
+import numpy as np
+import torchvision.transforms as transforms
+from PIL import Image
 from huggingface_hub import hf_hub_download
+# Define the Generator architecture (as shown in the provided code)
+class ResidualBlock(nn.Module):
+    def __init__(self, channels):
+        super(ResidualBlock, self).__init__()
+        self.conv_block = nn.Sequential(
+            nn.ReflectionPad2d(1),
+            nn.Conv2d(channels, channels, 3),
+            nn.InstanceNorm2d(channels),
+            nn.ReLU(inplace=True),
+            nn.ReflectionPad2d(1),
+            nn.Conv2d(channels, channels, 3),
+            nn.InstanceNorm2d(channels)
+        )
+    def forward(self, x):
+        return x + self.conv_block(x)
+class Generator(nn.Module):
+    def __init__(self, input_channels=3, output_channels=3, n_residual_blocks=9):
+        super(Generator, self).__init__()
+        # Initial convolution
+        model = [
+            nn.ReflectionPad2d(3),
+            nn.Conv2d(input_channels, 64, 7),
+            nn.InstanceNorm2d(64),
+            nn.ReLU(inplace=True)
+        ]
+        # Downsampling
+        in_features = 64
+        out_features = in_features * 2
+        for _ in range(2):
+            model += [
+                nn.Conv2d(in_features, out_features, 3, stride=2, padding=1),
+                nn.InstanceNorm2d(out_features),
+                nn.ReLU(inplace=True)
+            ]
+            in_features = out_features
+            out_features = in_features * 2
+        # Residual blocks
+        for _ in range(n_residual_blocks):
+            model += [ResidualBlock(in_features)]
+        # Upsampling
+        out_features = in_features // 2
+        for _ in range(2):
+            model += [
+                nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1),
+                nn.InstanceNorm2d(out_features),
+                nn.ReLU(inplace=True)
+            ]
+            in_features = out_features
+            out_features = in_features // 2
+        # Output layer
+        model += [
+            nn.ReflectionPad2d(3),
+            nn.Conv2d(64, output_channels, 7),
+            nn.Tanh()
+        ]
+        self.model = nn.Sequential(*model)
+    def forward(self, x):
+        return self.model(x)
 # Download the model
+def download_model(direction="depth2image"):
+    if direction == "depth2image":
+        filename = "latest_net_G_A.pth"
+    else:  # "image2depth"
+        filename = "latest_net_G_B.pth"
+    model_path = hf_hub_download(
+        repo_id="Borcherding/depth2AnythingCycleGAN_RobotsV2",
+        filename=filename
+    )
+    return model_path
+# Image preprocessing
+def preprocess_image(image):
+    """
+    Preprocess image for model input
+    Args:
+        image: PIL Image or numpy array
+    Returns:
+        torch.Tensor: Normalized tensor ready for model input
+    """
+    if isinstance(image, np.ndarray):
+        image = Image.fromarray(image.astype('uint8'), 'RGB')
+    transform = transforms.Compose([
+        transforms.Resize(256),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
+    ])
+    return transform(image).unsqueeze(0)
+# Image postprocessing
+def postprocess_image(tensor):
+    """
+    Convert model output tensor to numpy image
+    Args:
+        tensor: Model output tensor
+    Returns:
+        numpy.ndarray: RGB image array (0-255)
+    """
+    tensor = tensor.squeeze(0).cpu()
+    tensor = (tensor + 1) / 2
+    tensor = tensor.clamp(0, 1)
+    tensor = tensor.permute(1, 2, 0).numpy()
+    return (tensor * 255).astype(np.uint8)
+# Example usage
+def transform_image(input_image_path, direction="depth2image"):
+    """
+    Transform an image using the Depth2Robot model
+    Args:
+        input_image_path: Path to input image
+        direction: "depth2image" or "image2depth"
+    Returns:
+        numpy.ndarray: Transformed image
+    """
+    # Load model
+    model_path = download_model(direction)
+    model = Generator()
+    model.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False)
+    model.eval()
+    # Load and preprocess image
+    input_image = Image.open(input_image_path).convert('RGB')
+    input_tensor = preprocess_image(input_image)
+    # Generate output
+    with torch.no_grad():
+        output_tensor = model(input_tensor)
+    # Postprocess output
+    output_image = postprocess_image(output_tensor)
+    return output_image
 ```
+## Model Checkpoints
+The model checkpoints are available on Hugging Face:
+- Repository: [Borcherding/depth2AnythingCycleGAN_RobotsV2](https://huggingface.co/Borcherding/depth2AnythingCycleGAN_RobotsV2)
+- Files:
+  - `latest_net_G_A.pth` - Generator for Depth to Robot Image transformation
+  - `latest_net_G_B.pth` - Generator for Robot Image to Depth transformation
+## Integration with Depth Anything V2
+The integrated application (`integrated-depth-cyclegan.py`) also leverages [Depth Anything V2](https://github.com/depth-anything/Depth-Anything-V2) for real-time depth estimation, providing a complete pipeline:
+1. Capture webcam input
+2. Generate depth maps with Depth Anything V2
+3. Apply CycleGAN transformation
+4. Output to virtual camera
+## Requirements
+- Python 3.7+
+- PyTorch 1.7+
+- torchvision
+- gradio
+- pyvirtualcam (for webcam integration)
+- OpenCV (cv2)
+- Depth Anything V2 (for integrated application)
+## License
+[Insert your license information here]
+## Acknowledgments
+- This model uses CycleGAN architecture from the paper [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) by Zhu et al.
+- The implementation is based on [junyanz/pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)
+- Integrated application leverages Depth Anything V2 for depth estimation