EO-VAE: Towards A Multi-sensor Tokenizer for Earth Observation Data
Paper
•
2602.12177
•
Published
Earth Observation Variational Autoencoder. Self-contained, no external eo-vae package—uses only torch, safetensors, and huggingface_hub.
from huggingface_hub import snapshot_download
import sys
import torch
model_dir = snapshot_download("BiliSakura/EO-VAE")
sys.path.insert(0, model_dir)
from eo_vae import EOVAEModel, WAVELENGTHS
vae = EOVAEModel.from_pretrained(model_dir, torch_dtype=torch.float32, device="cpu")
x = torch.randn(1, 3, 256, 256)
wvs = torch.tensor(WAVELENGTHS["S2RGB"], dtype=torch.float32)
with torch.no_grad():
recon = vae.reconstruct(x, wvs)
z = vae.encode_spatial_normalized(x, wvs) # [1, 32, 32, 32]
| Property | Value |
|---|---|
| Architecture | Flux-style VAE with wavelength-conditioned dynamic convolutions |
| Input | [B, C, 256, 256] + wavelengths [C] |
| Latent | [B, 32, 32, 32] (spatial) or [B, 128, 16, 16] (packed) |
| Modalities | S2RGB, S2L2A, S1RTC, S1GRD |
| Modality | Values |
|---|---|
| S2RGB | 0.665, 0.56, 0.49 |
| S1RTC | 5.4, 5.6 |
| S2L2A | 0.443, 0.490, 0.560, 0.665, 0.705, 0.740, 0.783, 0.842, 0.865, 1.610, 2.190, 0.945 |
| S2L1C | 0.443, 0.490, 0.560, 0.665, 0.705, 0.740, 0.783, 0.842, 0.865, 0.945, 1.375, 1.610, 2.190 |
@article{eo-vae,
title={EO-VAE: Towards A Multi-sensor Tokenizer for Earth Observation Data},
author={Lehmann, Nils and Wang, Yi and Xiong, Zhitong and Zhu, Xiaoxiang},
journal={arXiv preprint arXiv:2602.12177},
year={2026}
}
Base model
nilsleh/eo-vae