| | --- |
| | tags: |
| | - walrus |
| | - foundation-model |
| | - physics |
| | - continuum-dynamics |
| | - transformer |
| | - PDE |
| | datasets: |
| | - polymathic-ai/shear_flow |
| | - polymathic-ai/gray_scott_reaction_diffusion |
| | - polymathic-ai/active_matter |
| | - polymathic-ai/turbulent_radiative_layer_2D |
| | - polymathic-ai/supernova_explosion_64 |
| | - polymathic-ai/turbulence_gravity_cooling |
| | - polymathic-ai/rayleigh_benard |
| | - polymathic-ai/planetswe |
| | - polymathic-ai/acoustic_scattering_inclusions |
| | - polymathic-ai/MHD_64 |
| | - polymathic-ai/rayleigh_taylor_instability |
| | - polymathic-ai/acoustic_scattering_discontinuous |
| | - polymathic-ai/acoustic_scattering_maze |
| | - polymathic-ai/helmholtz_staircase |
| | - polymathic-ai/viscoelastic_instability |
| | - BGLab/FlowBench |
| | license: mit |
| | --- |
| | |
| | # Walrus: A Cross-Domain Foundation Model for Continuum Dynamics |
| |
|
| | [](https://opensource.org/licenses/MIT) |
| | [](https://github.com/PolymathicAI/walrus) |
| | [](https://arxiv.org/abs/2511.15684) |
| |
|
| | Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems. |
| |
|
| | Walrus is trained jointly across **19 diverse physical domains** spanning: |
| | - astrophysics |
| | - geoscience |
| | - rheology |
| | - plasma physics |
| | - acoustics |
| | - classical fluids |
| |
|
| | These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems. |
| |
|
| | --- |
| |
|
| | # Model Description |
| |
|
| | Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t). |
| |
|
| | We define the difference between two consecutive snapshots as: |
| | Δu(t+1) = u(t+1) − u(t) |
| |
|
| | Given a short history of snapshots: |
| | U(t) = [u(t − τ + 1), ..., u(t)] |
| | |
| | The model predicts the next state using: |
| | u(t+1) ≈ u(t) + M(U(t)) |
| | |
| | ### Key architectural components |
| |
|
| | - **Adaptive-compute patch embedding** |
| | - Token count automatically balanced across resolutions |
| | - Enables mixing 2D and 3D datasets efficiently |
| |
|
| | - **Patch Jittering** |
| | - A harmonic-analysis–motivated augmentation technique |
| | - Reduces aliasing and spectral artifacts |
| | - Improves long-horizon stability across 17/19 pretraining datasets |
| |
|
| | - **Tensor-law–aware data augmentation** |
| | - 2D data embedded into 3D through plane rotations |
| | - Vector/tensor fields rotated with correct physical transformations |
| |
|
| | - **Asymmetric normalization** |
| | - **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ. |
| |
|
| | --- |
| |
|
| | # Pretraining Details |
| |
|
| | Walrus is pretrained 19 physical datasets with: |
| |
|
| | - **Loss**: Per-field normalized L1 loss |
| | - **Optimizer**: AdamW |
| | - **Batching**: System-uniform hierarchical sampling |
| | - **Time-striding**: Random stride (1–5) per training example |
| | - **Patch jitter range**: Uniform per-axis random offset |
| | - **Dimensional unification**: 2D fields embedded as thin 3D volumes |
| |
|
| | The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss. |
| |
|
| | --- |
| |
|
| | # Intended Use |
| |
|
| | This pretrained checkpoint is suitable for: |
| |
|
| | ### ✔ Next-step prediction |
| | ### ✔ Fast surrogate simulation |
| | ### ✔ Autoregressive rollout of physical systems |
| | ### ✔ Transfer learning to new physical settings |
| |
|
| | # Resources |
| |
|
| | Paper: https://arxiv.org/pdf/2511.15684 |
| | Github: https://github.com/PolymathicAI/walrus |
| | Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks |
| | |
| | Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so |
| | it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model |
| | without Well-formatted data. |
| | |
| | |
| | # Demonstrated downstream tasks |
| | |
| | We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. |
| | Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows: |
| | |
| | ### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main |
| | ### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main |
| | ### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main |
| | ### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main |
| | ### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main |
| | ### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main |
| | ### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main |
| | ### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main |
| | |
| | |
| | Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky. |
| | |
| | More finetuning checkpoints will continue to be added to HF over time. |