Insight about Prithvi

#13

by AnuIdame - opened Sep 9, 2023

Sep 9, 2023

•

edited Sep 9, 2023

We aim to understand the architecture of Prithvi-100M, including
a) How to deal with heterogeneous data. In this case, different resolutions/picture sizes and possibly different numbers of channels
b) How to deal with metadata, such as spatial positioning of the image
c) What goes into pre-training and what into fine-tuning
Can you help to understand these details?

CEPhillips

IBM NASA Geospatial org Sep 19, 2023

Hello, dealing with your questions in order:
a) The model is trained for specific resolution, and deviating from that will likely result in poor performance. Changing the number of input channels is not possible without re-training.
b) The model itself is location-agnostic in that spatial metadata is not an input.
c) Pre-training is a self-supervised training that sees many more images than the fine-tuning procedure. During pre-training, the model masks images and then reconstructs them to learn relationships between the inputs. During fine-tuning, the model is provided fewer but labelled images and learns to predict those labels.

AnuIdame

Sep 19, 2023

•

edited Sep 19, 2023

@CEPhillips Thank you very much for the explanation. We have two more questions as well.

Does the pre-training dataset include data such as burn-scar and flood detection?
It is stated that pre-training images are time series images where each input has three-time stamps. What is the time gap between them? Do these three images have any shifts, or do they perfectly overlap on each other perfectly?[whether they are same exact location]

CEPhillips

IBM NASA Geospatial org Sep 19, 2023

I'm glad that helps. The pre-training does not include burn-scar or flood detection tasks. Those are handled during fine-tuning. Regarding the input data, HLS observations are tiled and there are no spatial shifts between times. It is my understanding that times can very somewhat due to irregular overpasses by the satellite.

carlosgomes98 changed discussion status to closed Sep 27, 2023

AnuIdame changed discussion status to open Sep 30, 2023

AnuIdame

Sep 30, 2023

@CEPhillips In the description it is stated "The model can also handle static imagery which can be fed into the model with T=1.". Do you feed T=1 images for the pre-training. If that so how the network handles both T=1 and T=3 differently?

Paolo-Fraccaro

IBM NASA Geospatial org Oct 2, 2023

Hi @AnuIdame ! For pretraining we always fed 3 images. However, as you can see from the downstream tasks we tackled, once you use the pretrained encoder you can customise it to the number of T you want.

bdgsud

Jun 20

Hello, can someone help me to fine tune this model for processing data to predict something, how can i start? i want to do it on google colab but dk what to do, like how to import it there then what to do exactly after that. any advice would be highly appreciated

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment