--- datasets: - Borcherding/Hed2CoralReef_Annotations tags: - hed-to-reef - image-to-image - cyclegan - hed-to-anything --- # CycleGAN_Hed2CoralReef Model This model transforms HED maps into coral reef style images, and also transforms coral reef style images into estimated HED maps using the CycleGAN architecture via junyanz's [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix).
depth map robot-style image
robot-style image depth map
## Model Description - This model was trained on coral reef images generated with SDXL, and their associated HED maps, taken with pytorch-hed: [Depth2RobotsV2_Annotations](https://huggingface.co/datasets/Borcherding/Depth2RobotsV2_Annotations) - using CycleGAN architecture - Training notebooks and dataset genertors can be found in the src folder, and can also be found in the github repo !(Ollama-Agent-Roll-Cage/controlnet-2-anything)[https://github.com/Ollama-Agent-Roll-Cage/controlnet-2-anything] - It supports bidirectional transformation: - HED map → Coral reef-style imagery - Robot-style imagery → Depth map - The model uses a ResNet-based generator with residual blocks ## Installation ```bash # Clone the repository git clone https://huggingface.co/Borcherding/CycleGAN_Depth2RobotsV2_Blend cd cycleGAN_Depth2RobotsV2 # Install dependencies pip install torch torchvision gradio pyvirtualcam ``` ## Usage Options ### Option 1: Simple Test Interface Run the simple test interface to quickly try out the model: ```bash python cycleGANtest.py ``` This launches a Gradio interface where you can: - Upload an image - Select conversion direction (Depth to Image or Image to Depth) - Transform the image with a single click ### Option 2: Webcam Integration with Depth Estimation For a more advanced setup that includes real-time webcam processing with Depth Anything V2: ```bash # Set the path to Depth Anything V2 export DEPTH_ANYTHING_V2_PATH=/path/to/depth-anything-v2 # Run the integrated application python discordDepth2AnythingGAN.py ``` This launches a Gradio interface that allows you to: - Capture webcam input - Generate depth maps using Depth Anything V2 - Apply winter-themed colormap to depth maps - Apply CycleGAN transformation in either direction - Output to a virtual camera for use in video conferencing or streaming ## Using the Model Programmatically ```python import torch import numpy as np import torchvision.transforms as transforms from PIL import Image from huggingface_hub import hf_hub_download # Define the Generator architecture (as shown in the provided code) class ResidualBlock(nn.Module): def __init__(self, channels): super(ResidualBlock, self).__init__() self.conv_block = nn.Sequential( nn.ReflectionPad2d(1), nn.Conv2d(channels, channels, 3), nn.InstanceNorm2d(channels), nn.ReLU(inplace=True), nn.ReflectionPad2d(1), nn.Conv2d(channels, channels, 3), nn.InstanceNorm2d(channels) ) def forward(self, x): return x + self.conv_block(x) class Generator(nn.Module): def __init__(self, input_channels=3, output_channels=3, n_residual_blocks=9): super(Generator, self).__init__() # Initial convolution model = [ nn.ReflectionPad2d(3), nn.Conv2d(input_channels, 64, 7), nn.InstanceNorm2d(64), nn.ReLU(inplace=True) ] # Downsampling in_features = 64 out_features = in_features * 2 for _ in range(2): model += [ nn.Conv2d(in_features, out_features, 3, stride=2, padding=1), nn.InstanceNorm2d(out_features), nn.ReLU(inplace=True) ] in_features = out_features out_features = in_features * 2 # Residual blocks for _ in range(n_residual_blocks): model += [ResidualBlock(in_features)] # Upsampling out_features = in_features // 2 for _ in range(2): model += [ nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1), nn.InstanceNorm2d(out_features), nn.ReLU(inplace=True) ] in_features = out_features out_features = in_features // 2 # Output layer model += [ nn.ReflectionPad2d(3), nn.Conv2d(64, output_channels, 7), nn.Tanh() ] self.model = nn.Sequential(*model) def forward(self, x): return self.model(x) # Download the model def download_model(direction="depth2image"): if direction == "depth2image": filename = "latest_net_G_A.pth" else: # "image2depth" filename = "latest_net_G_B.pth" model_path = hf_hub_download( repo_id="Borcherding/CycleGAN_Depth2RobotsV2_Blend", filename=filename ) return model_path # Image preprocessing def preprocess_image(image): """ Preprocess image for model input Args: image: PIL Image or numpy array Returns: torch.Tensor: Normalized tensor ready for model input """ if isinstance(image, np.ndarray): image = Image.fromarray(image.astype('uint8'), 'RGB') transform = transforms.Compose([ transforms.Resize(256), transforms.ToTensor(), transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ]) return transform(image).unsqueeze(0) # Image postprocessing def postprocess_image(tensor): """ Convert model output tensor to numpy image Args: tensor: Model output tensor Returns: numpy.ndarray: RGB image array (0-255) """ tensor = tensor.squeeze(0).cpu() tensor = (tensor + 1) / 2 tensor = tensor.clamp(0, 1) tensor = tensor.permute(1, 2, 0).numpy() return (tensor * 255).astype(np.uint8) # Example usage def transform_image(input_image_path, direction="depth2image"): """ Transform an image using the Depth2Robot model Args: input_image_path: Path to input image direction: "depth2image" or "image2depth" Returns: numpy.ndarray: Transformed image """ # Load model model_path = download_model(direction) model = Generator() model.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) model.eval() # Load and preprocess image input_image = Image.open(input_image_path).convert('RGB') input_tensor = preprocess_image(input_image) # Generate output with torch.no_grad(): output_tensor = model(input_tensor) # Postprocess output output_image = postprocess_image(output_tensor) return output_image ``` ## Model Checkpoints The model checkpoints are available on Hugging Face: - Repository: [Borcherding/Depth2RobotsV2_Annotations](https://huggingface.co/datasets/Borcherding/Depth2RobotsV2_Annotations) - Files: - `latest_net_G_A.pth` - Generator for Depth to Robot Image transformation - `latest_net_G_B.pth` - Generator for Robot Image to Depth transformation ## Integration with Depth Anything V2 The integrated application (`discordDepth2AnythingGAN.py`) also leverages [Depth Anything V2](https://github.com/depth-anything/Depth-Anything-V2) for real-time depth estimation, providing a complete pipeline: 1. Capture webcam input 2. Generate depth maps with Depth Anything V2 3. Apply CycleGAN transformation 4. Output to virtual camera ## Requirements - Python 3.7+ - PyTorch 1.7+ - torchvision - gradio - pyvirtualcam (for webcam integration) - OpenCV (cv2) - Depth Anything V2 (for integrated application) ## License [Insert your license information here] ## Acknowledgments - This model uses CycleGAN architecture from the paper [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) by Zhu et al. - The implementation is based on [junyanz/pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) - Integrated application leverages Depth Anything V2 for depth estimation