BioKinema: Physically Grounded Generative Modeling of All-Atom Biomolecular Dynamics

Paper Code License Data License GitHub Link

Introduction

BioKinema is a physically grounded generative model that predicts continuous-time, all-atom biomolecular trajectories at a fraction of the cost of traditional molecular dynamics (MD) simulations. It is built on top of Protenix (ByteDance's AlphaFold 3 reproduction) and extends it with a temporal-attention mechanism derived from Langevin dynamics, so a single model can roll out MD-like trajectories at arbitrary, possibly non-uniform frame intervals.

The temporal-attention bias follows a stretched-exponential decay B_ij = -λ |t_i - t_j|^β, where λ is a per-head learnable decay (ALiBi-initialised) and β is a fixed time-scaling exponent selected per model variant.

This HuggingFace repository hosts the released weights and processed data. For installation, inference, training, the data pipeline, and the manuscript benchmark code, see the BioKinema GitHub repository.

Repository Contents

File Description Size
BioKinema_atlas+misato+mdposit_sqrt.pt sqrt checkpoint (EMA). For protein–ligand complexes and short-time MD. Trained on Atlas + MISATO + MDposit with β = 0.5. ~3.9 GB
BioKinema_CATH+octapeptide_beta0.25.pt beta=0.25 checkpoint (EMA). For long-time, single-chain protein MD. Trained on MSR (CATH / MegaSim / octapeptides) with β = 0.25; adds a TICA-dynamics loss. ~3.9 GB
biokinema_codec_bundle.tar Processed MISATO / MDposit / unbinding data in a lossless compressed codec (one template bioassembly per trajectory + a stacked-coordinate array). Used by sqrt training. ~41 GB

Which Checkpoint to Use

  • Complexes / protein–ligand, or short MDBioKinema_atlas+misato+mdposit_sqrt.pt (run with β = 0.5).
  • Long single-chain protein MD / kineticsBioKinema_CATH+octapeptide_beta0.25.pt (run with β = 0.25).

The exponent β must match the checkpoint at inference time (pass it via --beta).

Usage

Clone the BioKinema repository, install the environment, then run inference:

bash inference.sh \
    --checkpoint_path ./checkpoints/BioKinema_atlas+misato+mdposit_sqrt.pt \
    --dump_dir ./output \
    --input_file ./experiments/atlas_benchmark/init_frames/7lp1_A_R1_0.cif \
    --beta 0.5

DDP checkpoints (with a module. prefix) are handled automatically by the inference runner.

Codec Bundle

tar -xf biokinema_codec_bundle.tar -C $BIOKINEMA_UNBINDING_ROOT
# -> $BIOKINEMA_UNBINDING_ROOT/{misato_codec,mdposit_codec,unbinding_codec}

Point each dataset's bioassembly_dict_dir at the corresponding *_codec/ directory; the data loader auto-detects and decompresses on the fly.

Training Data

  • Atlas — public ATLAS MD database (preprocessing scripts shipped in the code repository).
  • MSR (CATH / Octapeptides / MegaSim) — Zenodo: 10.5281/zenodo.15629740, 10.5281/zenodo.15641199, 10.5281/zenodo.15641184.
  • MISATO / MDposit / unbinding — released here as the compressed codec bundle above.

Citation

@article{feng2026physically,
  title={Physically Grounded Generative Modeling of All-Atom Biomolecular Dynamics},
  author={Feng, Bin and Zhang, Jiying and Zhang, Xinni and Zhang, Ming and Barth, Patrick and Liu, Zijing and Li, Yu},
  journal={bioRxiv},
  pages={2026--02},
  year={2026},
  publisher={Cold Spring Harbor Laboratory}
}

Acknowledgements

This project was built based on Protenix, an open-source biomolecular structure prediction framework developed by ByteDance.

Contact

For questions or collaborations, please open an issue or contact us at [email protected].

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support