AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
Abstract
Geospatial models must adapt to the diversity of Earth observation data in terms of resolutions, scales, and modalities. However, existing approaches expect fixed input configurations, which limits their practical applicability. We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and resolution-adaptive spatial encoders, allowing us to train a single model on highly heterogeneous data in a self-supervised manner. To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of 5 multimodal datasets with varying characteristics and 11 distinct sensors. We then train a single powerful model on these diverse datasets simultaneously. Once fine-tuned, we achieve better or near state-of-the-art results on the datasets of GeoPlex and 4 additional ones for 5 environment monitoring tasks: land cover mapping, tree species identification, crop type classification, change detection, and flood segmentation. The code and models are available at https://github.com/gastruc/AnySat.
Community
Key Features:
π Versatile Model:Handles diverse datasets with resolutions spanning 3β11 channels, tiles ranging from 0.3 to 2600 hectares, and any combination of 11 sensors.
π Simple to Use: Install and download AnySat with a single line of code, select your desired modalities and patch size, and immediately generate rich features.
π¦ Flexible Task Adaptation: Supports fine-tuning and linear probing for tasks like tile-wise classification and semantic segmentation.
π§βπ Multi-dataset Training: Trains a single model across multiple datasets with varying characteristics.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model for Earth Observation Applications (2024)
- Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing (2024)
- MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps (2024)
- Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images (2024)
- Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation (2024)
- C-DiffSET: Leveraging Latent Diffusion for SAR-to-EO Image Translation with Confidence-Guided Reliable Object Generation (2024)
- From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper