--- license: apache-2.0 datasets: - prithivMLmods/Multilabel-GeoSceneNet-16K library_name: transformers language: - en base_model: - google/siglip2-base-patch16-224 pipeline_tag: image-classification tags: - Structures - Desert - Glacier - Street - Ocean - Image-Classifier - art - Mountain --- ![DCV.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/b3meMjfW6qOwWkuE-UCKQ.png) # **Multilabel-GeoSceneNet** > **Multilabel-GeoSceneNet** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-label** image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the **SiglipForImageClassification** architecture. ```py Classification Report: precision recall f1-score support Buildings and Structures 0.8881 0.9498 0.9179 2190 Desert 0.9649 0.9480 0.9564 2000 Forest Area 0.9807 0.9855 0.9831 2271 Hill or Mountain 0.8616 0.8993 0.8800 2512 Ice Glacier 0.9114 0.8382 0.8732 2404 Sea or Ocean 0.9328 0.9525 0.9426 2274 Street View 0.9476 0.9106 0.9287 2382 accuracy 0.9245 16033 macro avg 0.9267 0.9263 0.9260 16033 weighted avg 0.9253 0.9245 0.9244 16033 ``` ![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Ld-vFb2MWg43wAG5pyFZb.png) --- The model predicts the presence of one or more of the following **7 geographic scene categories**: ``` Class 0: "Buildings and Structures" Class 1: "Desert" Class 2: "Forest Area" Class 3: "Hill or Mountain" Class 4: "Ice Glacier" Class 5: "Sea or Ocean" Class 6: "Street View" ``` --- ## **Install dependencies** ```python !pip install -q transformers torch pillow gradio ``` --- ## **Inference Code** ```python import gradio as gr from transformers import AutoImageProcessor, SiglipForImageClassification from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/Multilabel-GeoSceneNet" # Updated model name model = SiglipForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) def classify_geoscene_image(image): """Predicts geographic scene labels for an input image.""" image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.sigmoid(logits).squeeze().tolist() # Sigmoid for multilabel labels = { "0": "Buildings and Structures", "1": "Desert", "2": "Forest Area", "3": "Hill or Mountain", "4": "Ice Glacier", "5": "Sea or Ocean", "6": "Street View" } threshold = 0.5 predictions = { labels[str(i)]: round(probs[i], 3) for i in range(len(probs)) if probs[i] >= threshold } return predictions or {"None Detected": 0.0} # Create Gradio interface iface = gr.Interface( fn=classify_geoscene_image, inputs=gr.Image(type="numpy"), outputs=gr.Label(label="Predicted Scene Categories"), title="Multilabel-GeoSceneNet", description="Upload an image to detect multiple geographic scene elements (e.g., forest, ocean, buildings)." ) if __name__ == "__main__": iface.launch() ``` --- ## **Intended Use:** The **Multilabel-GeoSceneNet** model is suitable for recognizing multiple geographic and structural elements in a single image. Use cases include: - **Remote Sensing:** Label elements in satellite or drone imagery. - **Geographic Tagging:** Auto-tagging images for search or sorting. - **Environmental Monitoring:** Identify features like glaciers or forests. - **Scene Understanding:** Help autonomous systems interpret complex scenes.