GAIA: A Foundation Model for Operational Atmospheric Dynamics

We present the GAIA (Geospatial Artificial Intelligence for Atmospheres) Foundation Model, a novel model that combines masked autoencoders (MAE) and Self-Distillation with NO labels (DINO) for analyzing global atmospheric patterns in satellite imagery. By integrating these complementary self-supervised learning approaches, our model simultaneously captures both local features and global dependencies, addressing two critical challenges in satellite data analysis: reconstructing missing regions and estimating precipitation patterns. The model demonstrates superior attention distribution and temporal pattern capture compared to standard MAE approaches, while maintaining robust performance in downstream tasks

Architecture Overview

GAIA employs a transformer-based architecture specifically designed to handle spatio-temporal satellite data with the following attributes:

Backbone: Masked Autoencoder (MAE) He et al. 2021 combined with DINO self-distillation Caron et al. 2021 architecture
Multi-Objective Pretraining: Combining MAE and DINO pretraining objectives encourages deeper embeddings of satellite data
Self-Supervised Training: Self-supervised pre-training on masked satellite imagery
Resolution: Supports processing of medium-resolution geostationary satellite data (0.25 degree), with code that can be configured to adjust resolution and patch size
Input Channels: Single channel (long-wave infrared) from geostationary satellites

Pre-trained Models

The base GAIA model is pre-trained on a comprehensive dataset of geostationary satellite observations from 2001-2015. The pre-trained model weights are available at this link.

model = GAIABase(
    config_path
)
    
model.configure_model()
state_dict = torch.load("checkpoints/gaia-base-v1-1.pt")
model.load_state_dict(state_dict)

Downstream Models

We release code for all downstream tasks presented in the paper, along with checkpoints for the precipitation model.

1. Precipitation

Precipitation estimation with GAIA aims to provide accurate, high-resolution global precipitation rate predictions in near-real time using only geostationary satellite long-wave infrared (LW-IR) imagery. Unlike traditional approaches, which often require multiple satellite channels and complex retrieval algorithms, GAIA directly infers precipitation rates by extracting deep spatio-temporal features from IR imagery with its foundation model. This approach enables continuous, wide-area monitoring of precipitation—even in regions without ground-based radar or dense in-situ sensors—supporting meteorological and climate applications worldwide. The precipitation task utilizes feature embeddings from the GAIA base model to deliver robust estimates that improve upon standard pixel-wise IR-based methods in both spatial structure and total rainfall accuracy.

Code for the precipitation downstream task is available in downstream/precip, including a task-specific configuration and inference scripts. The task employs the model defined in downstream/precip/lightning_wrapper.py—a wrapper around the GAIA base model fine-tuned for precipitation estimation.

Model checkpoint: checkpoints/gaia-precip-v1-1.pt
Example inference notebook: notebooks/precip_inference.ipynb
Model configuration file: configs/precip_config.yaml

2. Atmospheric river segmentation

Atmospheric river (AR) segmentation with GAIA leverages the same core model architecture as precipitation estimation, differing only in configuration and task-specific parameters. GAIA identifies and segments atmospheric rivers in global geostationary satellite long-wave infrared (LW-IR) imagery by extracting spatio-temporal embeddings that capture the elongated moisture transport associated with AR events. This approach enables detection across diverse geographical regions and time periods, potentially supporting studies of extreme precipitation and hydrological impacts from atmospheric rivers worldwide. The model is trained to distinguish ARs based on these deep representations from IR imagery, and can operate even in areas lacking ground-based atmospheric observations.

Code for the AR segmentation downstream task is available in downstream/precip, using the same model architecture as precipitation. The task employs the model defined in downstream/precip/lightning_wrapper.py with AR-specific configuration parameters.

Model configuration file: configs/ar_config.yaml
Note: The AR segmentation task uses the same GAIA foundation model as precipitation estimation, but with task-specific configuration and argument values. Only the configuration file differs.

3. Tropical cyclone detection

Tropical cyclone (TC) detection and tracking with GAIA utilizes the same core foundation model, adapting it for identification and probabilistic localization of tropical cyclones in global geostationary satellite long-wave infrared (LW-IR) data. The approach extracts deep spatio-temporal features through GAIA’s transformer-based architecture, allowing accurate inference of cyclonic activity—including rapid intensification and cyclogenesis events—using only IR observations. GAIA’s TC detection model outputs cyclone probability heatmaps and can be further adapted for center localization and temporal tracking.

Code for the tropical cyclone downstream task is available in downstream/tropical_cyclone, including a task-specific configuration and inference scripts. The task employs the model defined in downstream/tropical_cyclone/lightning_wrapper.py—a wrapper around the GAIA base model fine-tuned for cyclone detection.

Demo and Inference

Quick Start

Create a Virtual Environment (Recommended)

conda create -n gaia_env python=3.10 -y
conda activate gaia_env

Clone the Repository

git clone https://huggingface.co/bcg-usra-nasa-gaia/GAIA-v1
cd GAIA-v1

Download Checkpoints and Data

# Install git-lfs if not already installed
# For MacOS (using Homebrew):
brew install git-lfs

# For Ubuntu/Debian:
# sudo apt-get install git-lfs

# For Windows, download installer from https://git-lfs.com/

# Then, set up git-lfs (run once per machine)
git lfs install

# To download all files
git lfs pull

# To download a specific file with git-lfs, use:
git lfs pull -I <path/to/your/file>

# Example: Download only the GAIA precipitation checkpoint file
git lfs pull -I checkpoints/gaia-precip-v1-1.pt

Install Dependencies
```
pip install -r requirements.txt
```
Run Inference Notebooks Navigate to the notebooks directory to run the demo notebooks:
- For gap-filling:
```
cd notebooks
jupyter notebook gapfill_inference.ipynb
```
- For precipitation estimation:
```
jupyter notebook precip_inference.ipynb
```

Feedback

We welcome feedback and contributions! Please:

Open issues for bugs or feature requests
Submit pull requests for improvements
Share your use cases and results

Citation

If you use GAIA in your research, please cite:

@article{gaia-fm,
  title={GAIA: A Foundation Model for Operational Atmospheric Dynamics},
  author={Ata Akbari Asanjan and Olivia Alexander and Tom Berg and Stephen Peng and Jad Makki and Clara Zhang and Matt Yang and Disha Shidham and Srija Chakraborty and William Bender and Cara Crawford and Arun Ravindran and Olivier Raiman and David Potere and David Bell},
  year={2025},
  eprint={2505.18179},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.18179}, 
}

Copyright Notice:

Copyright © 2025 Boston Consulting Group and Universities Space Research Association. All rights reserved. Unauthorized use, reproduction, or distribution of this software is strictly prohibited unless it is licensed for use under terms in the “Apache 2.0” license.

Scope and limitations:

GAIA is released strictly as a research prototype meant to showcase how this methodology can learn useful representations from GOES data. It is aimed at research and educational communities to demonstrate the potential of MAEs and DINO in gap filling and precipitation estimation. The model is not a production-ready forecasting tool. It is intended to let researchers experiment in controlled, academic settings.

In offline experiments GAIA’s embeddings have proved valuable for gap filling of infrared imagery and precipitation estimation. This model shows promise and could be extended for further tasks, such as identifying atmospheric rivers or tropical cyclones. However, it is important to note that the model was trained only on infrared channels, covers 60° S–60° N between 2001 and 2015, and has not been tested outside of this data. The method also may lead to issues such as over-smoothed outputs due to the pattern continuity method. Consequently, any quantitative conclusions drawn from GAIA should be treated as diagnostic and must be cross-checked.

Because of these limitations, GAIA must not be used in safety-critical or high-stakes settings such as flight planning, maritime routing, disaster response, or financial and insurance decisions. It should not be repurposed for other applications without extensive additional work. Anyone who wishes to deploy it operationally would need to accomplish task such as retraining or fine-tuning on up-to-date sensor data, conducting rigorous calibration and out-of-distribution testing, applying rigorous human oversight, and obtaining any required regulatory approvals (e.g., from the FAA or national meteorological agencies).

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support