|
--- |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- Aerial Image Segmentation |
|
- Road Detection |
|
- Semantic Segmentation |
|
- U-Net-50 |
|
- Computer Vision |
|
- Remote Sensing |
|
- Urban Planning |
|
- Geographic Information Systems (GIS) |
|
- Deep Learning |
|
datasets: |
|
- balraj98/massachusetts-roads-dataset |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This model card provides an overview of a computer vision model designed for aerial image road segmentation using the U-Net-50 architecture. The model is intended to accurately identify and segment road networks from aerial imagery, crucial for applications in mapping and autonomous driving. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** [spectrewolf8](https://github.com/Spectrewolf8) |
|
- **Model type:** Computer-Vision/Semantic-segmentation |
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/Spectrewolf8/aerial-image-road-segmentation-xp |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used to segment road networks from aerial images without additional fine-tuning. It is applicable in scenarios where detailed and accurate road mapping is required. |
|
|
|
### Downstream Use |
|
|
|
When fine-tuned on additional datasets, this model can be adapted for other types of semantic segmentation tasks, potentially enhancing applications in various remote sensing domains. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
# Import necessary classes |
|
from tensorflow.keras.models import load_model |
|
from tensorflow.python.keras import layers |
|
from tensorflow.python.keras.models import Sequential |
|
|
|
import random |
|
import numpy as np |
|
import matplotlib.pyplot as plt |
|
from tensorflow.keras.preprocessing.image import ImageDataGenerator |
|
|
|
seed=24 |
|
batch_size= 8 |
|
|
|
# Load images for dataset generators from respective dataset libraries. The images and masks are returned as NumPy arrays |
|
|
|
# Images can be further resized by adding target_size=(150, 150) with any size for your network to flow_from_directory parameters |
|
# Our images are already cropped to 256x256 so traget_size parameter can be ignored |
|
|
|
def image_and_mask_generator(image_dir, label_dir): |
|
img_data_gen_args = dict(rescale = 1/255.) |
|
mask_data_gen_args = dict() |
|
|
|
image_data_generator = ImageDataGenerator(**img_data_gen_args) |
|
image_generator = image_data_generator.flow_from_directory(image_dir, |
|
seed=seed, |
|
batch_size=batch_size, |
|
classes = ["."], |
|
class_mode=None #Very important to set this otherwise it returns multiple numpy arrays thinking class mode is binary. |
|
) |
|
|
|
mask_data_generator = ImageDataGenerator(**mask_data_gen_args) |
|
mask_generator = mask_data_generator.flow_from_directory(label_dir, |
|
classes = ["."], |
|
seed=seed, |
|
batch_size=batch_size, |
|
color_mode = 'grayscale', #Read masks in grayscale |
|
class_mode=None |
|
) |
|
# print processed image paths for vanity |
|
print(image_generator.filenames[0:5]) |
|
print(mask_generator.filenames[0:5]) |
|
|
|
generator = zip(image_generator, mask_generator) |
|
return generator |
|
|
|
# Method to calculate Intersection over Union Accuracy Coefficient |
|
def iou_coef(y_true, y_pred, smooth=1e-6): |
|
intersection = tensorflow.reduce_sum(y_true * y_pred) |
|
union = tensorflow.reduce_sum(y_true) + tensorflow.reduce_sum(y_pred) - intersection |
|
|
|
return (intersection + smooth) / (union + smooth) |
|
|
|
# Method to calculate Dice Accuracy Coefficient |
|
def dice_coef(y_true, y_pred, smooth=1e-6): |
|
intersection = tensorflow.reduce_sum(y_true * y_pred) |
|
total = tensorflow.reduce_sum(y_true) + tensorflow.reduce_sum(y_pred) |
|
|
|
return (2. * intersection + smooth) / (total + smooth) |
|
|
|
# Method to calculate Dice Loss |
|
def soft_dice_loss(y_true, y_pred): |
|
return 1-dice_coef(y_true, y_pred) |
|
|
|
# Method to create generator |
|
def create_generator(zipped): |
|
for (img, mask) in zipped: |
|
yield (img, mask) |
|
|
|
model_path = "path" |
|
u_net_model = load_model(model_path, custom_objects={'soft_dice_loss': soft_dice_loss, 'dice_coef': dice_coef, "iou_coef": iou_coef}) |
|
|
|
test_generator = create_generator(image_and_mask_generator(output_test_image_dir,output_test_label_dir)) |
|
|
|
# Assuming create_generator is defined and provides images for prediction |
|
images, ground_truth_masks = next(test_generator) |
|
|
|
# Make predictions |
|
predictions = u_net_model.predict(images) |
|
|
|
# Apply threshold to predictions |
|
thresh_val = 0.8 |
|
prediction_threshold = (predictions > thresh_val).astype(np.uint8) |
|
|
|
# Visualize results |
|
num_samples = min(10, len(images)) # Use at most 10 samples or the total number of images available |
|
f = plt.figure(figsize=(15, 25)) |
|
for i in range(num_samples): |
|
ix = random.randint(0, len(images) - 1) # Ensure ix is within range |
|
|
|
f.add_subplot(num_samples, 4, i * 4 + 1) |
|
plt.imshow(images[ix]) |
|
plt.title("Image") |
|
plt.axis('off') |
|
|
|
f.add_subplot(num_samples, 4, i * 4 + 2) |
|
plt.imshow(np.squeeze(ground_truth_masks[ix])) |
|
plt.title("Ground Truth") |
|
plt.axis('off') |
|
|
|
f.add_subplot(num_samples, 4, i * 4 + 3) |
|
plt.imshow(np.squeeze(predictions[ix])) |
|
plt.title("Prediction") |
|
plt.axis('off') |
|
|
|
f.add_subplot(num_samples, 4, i * 4 + 4) |
|
plt.imshow(np.squeeze(prediction_threshold[ix])) |
|
plt.title(f"Thresholded at {thresh_val}") |
|
plt.axis('off') |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on the Massachusetts Roads Dataset, which includes high-resolution aerial images with corresponding road segmentation masks. The images were preprocessed by cropping into 256x256 patches and converting masks to binary format. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
- Images were cropped into 256x256 patches to manage memory usage and improve training efficiency. |
|
- Masks were binarized to create clear road/non-road classifications. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** FP32 precision |
|
- **Epochs:** 2 |
|
- **Batch Size:** 8 |
|
- **Learning Rate:** 0.0001 |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
The model was evaluated using a separate set of aerial images and their corresponding ground truth masks from the dataset. |
|
|
|
#### Metrics |
|
|
|
- **Intersection over Union (IoU):** Measures the overlap between predicted and actual road areas. |
|
- **Dice Coefficient:** Evaluates the similarity between predicted and ground truth masks. |
|
|
|
### Results |
|
|
|
The model achieved 71% accuracy in segmenting road networks from aerial images, with evaluation metrics indicating good performance in distinguishing road features from non-road areas. |
|
|
|
#### Summary |
|
|
|
The U-Net-50 model effectively segments road networks, demonstrating its potential for practical applications in urban planning and autonomous systems. |
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
- **Architecture:** U-Net-50 |
|
- **Objective:** Road segmentation in aerial images |
|
|
|
### Compute Infrastructure |
|
|
|
#### Software |
|
|
|
- **Framework:** TensorFlow 2.x |
|
- **Dependencies:** Keras, OpenCV, tifffile |
|
|
|
**BibTeX:** |
|
|
|
@misc{aerial-image-road-segmentation-with-U-NET-xp, |
|
author = {spectrewolf8}, |
|
title = {Aerial Image Road Segmentation Using U-Net-50}, |
|
year = {2024}, |
|
howpublished = {\url{https://github.com/Spectrewolf8/aerial-image-road-segmentation-xp}}, |
|
} |