walrus / README.md

Initial walrus commit

38cd852 4 months ago

5.61 kB

	---
	tags:
	- walrus
	- foundation-model
	- physics
	- continuum-dynamics
	- transformer
	- PDE
	datasets:
	- polymathic-ai/shear_flow
	- polymathic-ai/gray_scott_reaction_diffusion
	- polymathic-ai/active_matter
	- polymathic-ai/turbulent_radiative_layer_2D
	- polymathic-ai/supernova_explosion_64
	- polymathic-ai/turbulence_gravity_cooling
	- polymathic-ai/rayleigh_benard
	- polymathic-ai/planetswe
	- polymathic-ai/acoustic_scattering_inclusions
	- polymathic-ai/MHD_64
	- polymathic-ai/rayleigh_taylor_instability
	- polymathic-ai/acoustic_scattering_discontinuous
	- polymathic-ai/acoustic_scattering_maze
	- polymathic-ai/helmholtz_staircase
	- polymathic-ai/viscoelastic_instability
	- BGLab/FlowBench
	license: mit
	---

	# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
	[![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-blue?logo=github)](https://github.com/PolymathicAI/walrus)
	[![arXiv](https://img.shields.io/badge/arXiv-2511.15684-b31b1b.svg)](https://arxiv.org/abs/2511.15684)

	Walrus is a large-scale physics foundation model capable of modeling a broad range of continuum dynamical systems.

	Walrus is trained jointly across 19 diverse physical domains spanning:
	- astrophysics
	- geoscience
	- rheology
	- plasma physics
	- acoustics
	- classical fluids

	These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a general-purpose surrogate for physical simulation and a strong initialization for downstream fine-tuning on new PDE systems.

	---

	# Model Description

	Walrus is a 1.3B-parameter space–time Transformer trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).

	We define the difference between two consecutive snapshots as:
	Δu(t+1) = u(t+1) − u(t)

	Given a short history of snapshots:
	U(t) = [u(t − τ + 1), ..., u(t)]

	The model predicts the next state using:
	u(t+1) ≈ u(t) + M(U(t))

	### Key architectural components

	- Adaptive-compute patch embedding
	- Token count automatically balanced across resolutions
	- Enables mixing 2D and 3D datasets efficiently

	- Patch Jittering
	- A harmonic-analysis–motivated augmentation technique
	- Reduces aliasing and spectral artifacts
	- Improves long-horizon stability across 17/19 pretraining datasets

	- Tensor-law–aware data augmentation
	- 2D data embedded into 3D through plane rotations
	- Vector/tensor fields rotated with correct physical transformations

	- Asymmetric normalization
	- Asymmetric normalization: Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.

	---

	# Pretraining Details

	Walrus is pretrained 19 physical datasets with:

	- Loss: Per-field normalized L1 loss
	- Optimizer: AdamW
	- Batching: System-uniform hierarchical sampling
	- Time-striding: Random stride (1–5) per training example
	- Patch jitter range: Uniform per-axis random offset
	- Dimensional unification: 2D fields embedded as thin 3D volumes

	The model was pretrained on 96 NVIDIA H100 GPUs using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.

	---

	# Intended Use

	This pretrained checkpoint is suitable for:

	### ✔ Next-step prediction
	### ✔ Fast surrogate simulation
	### ✔ Autoregressive rollout of physical systems
	### ✔ Transfer learning to new physical settings

	# Resources

	Paper: https://arxiv.org/pdf/2511.15684
	Github: https://github.com/PolymathicAI/walrus
	Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks

	Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so
	it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model
	without Well-formatted data.


	# Demonstrated downstream tasks

	We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper.
	Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:

	### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
	### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
	### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
	### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
	### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
	### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
	### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
	### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main


	Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky.

	More finetuning checkpoints will continue to be added to HF over time.