GAIA: A Foundation Model for Operational Atmospheric Dynamics

We present the GAIA (Geospatial Artificial Intelligence for Atmospheres) Foundation Model, a novel model that combines masked autoencoders (MAE) and Self-Distillation with NO labels (DINO) for analyzing global atmospheric patterns in satellite imagery. By integrating these complementary self-supervised learning approaches, our model simultaneously captures both local features and global dependencies, addressing two critical challenges in satellite data analysis: reconstructing missing regions and estimating precipitation patterns. The model demonstrates superior attention distribution and temporal pattern capture compared to standard MAE approaches, while maintaining robust performance in downstream tasks

Architecture Overview

GAIA employs a transformer-based architecture specifically designed to handle spatio-temporal satellite data with the following attributes:

Backbone: Masked Autoencoder (MAE) He et al. 2021 combined with DINO self-distillation Caron et al. 2021 architecture
Local-Global Learning: MAE handles local patterns while DINO captures global atmospheric dynamics
Self-Supervised Training: Self-supervised pre-training on masked satellite imagery
Training Strategy: Our training strategy combines the MAE and DINO loss to learn local and global relationship in the geostationary satellite data
Resolution: Supports processing of medium-resolution geostationary satellite data (0.25 degree)
Input Channels: Single channel (long-wave infrared) from geostationary satellites

Pre-trained Models

The base GAIA model is pre-trained on a comprehensive dataset of geostationary satellite observations from 2001-2015. The pre-trained model weights are available at this link.

model = GAIABase(
    config_path
)
    
model.configure_model()
state_dict = torch.load("checkpoints/gaia-v1.pt")
model.load_state_dict(state_dict)

Downstream Tasks

The pre-trained model is adapted to the following downstream tasks.

1. Gap Filling in Operational Data

GAIA has been fine-tuned to address operational gaps in satellite data:

model = GAIAGapFill(
    config_path
)
    
model.configure_model()
state_dict = torch.load("checkpoints/gaia_gapfill.pt")
model.load_state_dict(state_dict)

The results showcase several key strengths of our approach:

Large Gap Reconstruction: The model successfully reconstructs substantial missing regions while preserving the temperature gradients and atmospheric patterns consistent with surrounding areas.
Pattern Continuity: The reconstructions maintain smooth transitions between filled regions and original data, avoiding artificial boundaries or discontinuities.
Detail Preservation: Fine-scale features such as cloud formations and temperature variations are accurately reproduced, suggesting the model has learned meaningful representations of atmospheric physics rather than simple interpolation.
Visible Patch Reconstruction: The fact that the model's reconstruction closely replicates the fine-grained details in the input image suggests that the encoder is generating rich latent representations that preserve details in the input.

2. Precipitation Estimation

GAIA was fine-tuned to learn the relationship between Geostationary Operational Environmental Satellite IR and precipitation data:

model = GAIAPrecip(
    config_path
)
    
model.configure_model()
state_dict = torch.load("checkpoints/gaia_precip.pt")
model.load_state_dict(state_dict)

The model effectively captures the spatial distribution and intensity patterns of precipitation across a variety of atmospheric scenarios. The precipitation predictions obtain an SSIM of 0.881.

Demo and Inference

Quick Start

Create a Virtual Environment (Recommended)

conda create -n gaia_env python=3.10 -y
conda activate gaia_env

Clone the Repository

git clone https://huggingface.co/bcg-usra-nasa-gaia/GAIA-v1
cd GAIA-v1

Install Dependencies
```
pip install -r requirements.txt
```
Run Inference Notebooks Navigate to the notebooks directory to run the demo notebooks:
- For gap-filling:
```
cd notebooks
jupyter notebook gap_filling_inference.ipynb
```
- For precipitation estimation:
```
jupyter notebook precipitation_inference.ipynb
```

Feedback

We welcome feedback and contributions! Please:

Open issues for bugs or feature requests
Submit pull requests for improvements
Share your use cases and results

Citation

If you use GAIA in your research, please cite:

@article{gaia-fm,
  title={GAIA: A Foundation Model for Operational Atmospheric Dynamics},
  author={Ata Akbari Asanjan and Olivia Alexander and Tom Berg and Clara Zhang and Matt Yang and Jad Makki and Disha Shidham and Srija Chakraborty and William Bender and Stephen Peng and Arun Ravindran and Olivier Raiman and David Potere and David Bell},
  year={2025},
  eprint={2505.18179},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.18179}, 
}

Copyright Notice:

Copyright © 2025 Boston Consulting Group and Universities Space Research Association. All rights reserved. Unauthorized use, reproduction, or distribution of this software is strictly prohibited unless it is licensed for use under terms in the “Apache 2.0” license.

Scope and limitations:

GAIA is released strictly as a research prototype meant to showcase how this methodology can learn useful representations from GOES data. It is aimed at research and educational communities to demonstrate the potential of MAEs and DINO in gap filling and precipitation estimation. The model is not a production-ready forecasting tool. It is intended to let researchers experiment in controlled, academic settings.

In offline experiments GAIA’s embeddings have proved valuable for gap filling of infrared imagery and precipitation estimation. This model shows promise and could be extended for further tasks, such as identifying atmospheric rivers or tropical cyclones. However, it is important to note that the model was trained only on infrared channels, covers 60° S–60° N between 2001 and 2015, and has not been tested outside of this data. The method also may lead to issues such as over-smoothed outputs due to the pattern continuity method. Consequently, any quantitative conclusions drawn from GAIA should be treated as diagnostic and must be cross-checked.

Because of these limitations, GAIA must not be used in safety-critical or high-stakes settings such as flight planning, maritime routing, disaster response, or financial and insurance decisions. It should not be repurposed for other applications without extensive additional work. Anyone who wishes to deploy it operationally would need to accomplish task such as retraining or fine-tuning on up-to-date sensor data, conducting rigorous calibration and out-of-distribution testing, applying rigorous human oversight, and obtaining any required regulatory approvals (e.g., from the FAA or national meteorological agencies).