|
--- |
|
datasets: |
|
- Borcherding/Hed2CoralReef_Annotations |
|
tags: |
|
- hed-to-reef |
|
- image-to-image |
|
- cyclegan |
|
- hed-to-anything |
|
--- |
|
|
|
# CycleGAN_Hed2CoralReef Model |
|
|
|
This model transforms HED maps into coral reef style images, and also transforms coral reef style images into estimated HED maps using the CycleGAN architecture via junyanz's [pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix). |
|
|
|
<div style="display: flex; flex-wrap: wrap; justify-content: center;"> |
|
<div style="display: flex; width: 100%; justify-content: center; margin-bottom: 10px;"> |
|
<img src="hed2image_fixed_testA/hed2image/test_latest/images/custom_real.png" alt="depth map" title="Depth Map (Input)" width="45%"> |
|
<img src="hed2image_fixed_testA/hed2image/test_latest/images/custom_fake.png" alt="robot-style image" title="Robot-Style Image (Output)" width="45%"> |
|
</div> |
|
<div style="display: flex; width: 100%; justify-content: center;"> |
|
<img src="hed2image_fixed_testB/hed2image/test_latest/images/custom_real.png" alt="robot-style image" title="Robot-Style Image (Input)" width="45%"> |
|
<img src="hed2image_fixed_testB/hed2image/test_latest/images/custom_fake.png" alt="depth map" title="Depth Map (Output)" width="45%"> |
|
</div> |
|
</div> |
|
|
|
## Model Description |
|
|
|
- This model was trained on coral reef images generated with SDXL, and their associated HED maps, taken with pytorch-hed: |
|
[Depth2RobotsV2_Annotations](https://huggingface.co/datasets/Borcherding/Depth2RobotsV2_Annotations) |
|
- using CycleGAN architecture |
|
- Training notebooks and dataset genertors can be found in the src folder, and can also be found in the github repo !(Ollama-Agent-Roll-Cage/controlnet-2-anything)[https://github.com/Ollama-Agent-Roll-Cage/controlnet-2-anything] |
|
- It supports bidirectional transformation: |
|
- HED map → Coral reef-style imagery |
|
- Robot-style imagery → Depth map |
|
- The model uses a ResNet-based generator with residual blocks |
|
|
|
## Installation |
|
|
|
```bash |
|
# Clone the repository |
|
git clone https://huggingface.co/Borcherding/CycleGAN_Depth2RobotsV2_Blend |
|
cd cycleGAN_Depth2RobotsV2 |
|
|
|
# Install dependencies |
|
pip install torch torchvision gradio pyvirtualcam |
|
``` |
|
|
|
## Usage Options |
|
|
|
### Option 1: Simple Test Interface |
|
|
|
Run the simple test interface to quickly try out the model: |
|
|
|
```bash |
|
python cycleGANtest.py |
|
``` |
|
|
|
This launches a Gradio interface where you can: |
|
- Upload an image |
|
- Select conversion direction (Depth to Image or Image to Depth) |
|
- Transform the image with a single click |
|
|
|
### Option 2: Webcam Integration with Depth Estimation |
|
|
|
For a more advanced setup that includes real-time webcam processing with Depth Anything V2: |
|
|
|
```bash |
|
# Set the path to Depth Anything V2 |
|
export DEPTH_ANYTHING_V2_PATH=/path/to/depth-anything-v2 |
|
|
|
# Run the integrated application |
|
python discordDepth2AnythingGAN.py |
|
``` |
|
|
|
This launches a Gradio interface that allows you to: |
|
- Capture webcam input |
|
- Generate depth maps using Depth Anything V2 |
|
- Apply winter-themed colormap to depth maps |
|
- Apply CycleGAN transformation in either direction |
|
- Output to a virtual camera for use in video conferencing or streaming |
|
|
|
## Using the Model Programmatically |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
import torchvision.transforms as transforms |
|
from PIL import Image |
|
from huggingface_hub import hf_hub_download |
|
|
|
# Define the Generator architecture (as shown in the provided code) |
|
class ResidualBlock(nn.Module): |
|
def __init__(self, channels): |
|
super(ResidualBlock, self).__init__() |
|
self.conv_block = nn.Sequential( |
|
nn.ReflectionPad2d(1), |
|
nn.Conv2d(channels, channels, 3), |
|
nn.InstanceNorm2d(channels), |
|
nn.ReLU(inplace=True), |
|
nn.ReflectionPad2d(1), |
|
nn.Conv2d(channels, channels, 3), |
|
nn.InstanceNorm2d(channels) |
|
) |
|
|
|
def forward(self, x): |
|
return x + self.conv_block(x) |
|
|
|
class Generator(nn.Module): |
|
def __init__(self, input_channels=3, output_channels=3, n_residual_blocks=9): |
|
super(Generator, self).__init__() |
|
|
|
# Initial convolution |
|
model = [ |
|
nn.ReflectionPad2d(3), |
|
nn.Conv2d(input_channels, 64, 7), |
|
nn.InstanceNorm2d(64), |
|
nn.ReLU(inplace=True) |
|
] |
|
|
|
# Downsampling |
|
in_features = 64 |
|
out_features = in_features * 2 |
|
for _ in range(2): |
|
model += [ |
|
nn.Conv2d(in_features, out_features, 3, stride=2, padding=1), |
|
nn.InstanceNorm2d(out_features), |
|
nn.ReLU(inplace=True) |
|
] |
|
in_features = out_features |
|
out_features = in_features * 2 |
|
|
|
# Residual blocks |
|
for _ in range(n_residual_blocks): |
|
model += [ResidualBlock(in_features)] |
|
|
|
# Upsampling |
|
out_features = in_features // 2 |
|
for _ in range(2): |
|
model += [ |
|
nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1), |
|
nn.InstanceNorm2d(out_features), |
|
nn.ReLU(inplace=True) |
|
] |
|
in_features = out_features |
|
out_features = in_features // 2 |
|
|
|
# Output layer |
|
model += [ |
|
nn.ReflectionPad2d(3), |
|
nn.Conv2d(64, output_channels, 7), |
|
nn.Tanh() |
|
] |
|
|
|
self.model = nn.Sequential(*model) |
|
|
|
def forward(self, x): |
|
return self.model(x) |
|
|
|
# Download the model |
|
def download_model(direction="depth2image"): |
|
if direction == "depth2image": |
|
filename = "latest_net_G_A.pth" |
|
else: # "image2depth" |
|
filename = "latest_net_G_B.pth" |
|
|
|
model_path = hf_hub_download( |
|
repo_id="Borcherding/CycleGAN_Depth2RobotsV2_Blend", |
|
filename=filename |
|
) |
|
return model_path |
|
|
|
# Image preprocessing |
|
def preprocess_image(image): |
|
""" |
|
Preprocess image for model input |
|
|
|
Args: |
|
image: PIL Image or numpy array |
|
|
|
Returns: |
|
torch.Tensor: Normalized tensor ready for model input |
|
""" |
|
if isinstance(image, np.ndarray): |
|
image = Image.fromarray(image.astype('uint8'), 'RGB') |
|
|
|
transform = transforms.Compose([ |
|
transforms.Resize(256), |
|
transforms.ToTensor(), |
|
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) |
|
]) |
|
|
|
return transform(image).unsqueeze(0) |
|
|
|
# Image postprocessing |
|
def postprocess_image(tensor): |
|
""" |
|
Convert model output tensor to numpy image |
|
|
|
Args: |
|
tensor: Model output tensor |
|
|
|
Returns: |
|
numpy.ndarray: RGB image array (0-255) |
|
""" |
|
tensor = tensor.squeeze(0).cpu() |
|
tensor = (tensor + 1) / 2 |
|
tensor = tensor.clamp(0, 1) |
|
tensor = tensor.permute(1, 2, 0).numpy() |
|
return (tensor * 255).astype(np.uint8) |
|
|
|
# Example usage |
|
def transform_image(input_image_path, direction="depth2image"): |
|
""" |
|
Transform an image using the Depth2Robot model |
|
|
|
Args: |
|
input_image_path: Path to input image |
|
direction: "depth2image" or "image2depth" |
|
|
|
Returns: |
|
numpy.ndarray: Transformed image |
|
""" |
|
# Load model |
|
model_path = download_model(direction) |
|
model = Generator() |
|
model.load_state_dict(torch.load(model_path, map_location='cpu'), strict=False) |
|
model.eval() |
|
|
|
# Load and preprocess image |
|
input_image = Image.open(input_image_path).convert('RGB') |
|
input_tensor = preprocess_image(input_image) |
|
|
|
# Generate output |
|
with torch.no_grad(): |
|
output_tensor = model(input_tensor) |
|
|
|
# Postprocess output |
|
output_image = postprocess_image(output_tensor) |
|
|
|
return output_image |
|
``` |
|
|
|
## Model Checkpoints |
|
|
|
The model checkpoints are available on Hugging Face: |
|
- Repository: [Borcherding/Depth2RobotsV2_Annotations](https://huggingface.co/datasets/Borcherding/Depth2RobotsV2_Annotations) |
|
- Files: |
|
- `latest_net_G_A.pth` - Generator for Depth to Robot Image transformation |
|
- `latest_net_G_B.pth` - Generator for Robot Image to Depth transformation |
|
|
|
## Integration with Depth Anything V2 |
|
|
|
The integrated application (`discordDepth2AnythingGAN.py`) also leverages [Depth Anything V2](https://github.com/depth-anything/Depth-Anything-V2) for real-time depth estimation, providing a complete pipeline: |
|
|
|
1. Capture webcam input |
|
2. Generate depth maps with Depth Anything V2 |
|
3. Apply CycleGAN transformation |
|
4. Output to virtual camera |
|
|
|
## Requirements |
|
|
|
- Python 3.7+ |
|
- PyTorch 1.7+ |
|
- torchvision |
|
- gradio |
|
- pyvirtualcam (for webcam integration) |
|
- OpenCV (cv2) |
|
- Depth Anything V2 (for integrated application) |
|
|
|
## License |
|
|
|
[Insert your license information here] |
|
|
|
## Acknowledgments |
|
|
|
- This model uses CycleGAN architecture from the paper [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) by Zhu et al. |
|
- The implementation is based on [junyanz/pytorch-CycleGAN-and-pix2pix](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix) |
|
- Integrated application leverages Depth Anything V2 for depth estimation |