sapiens
sapiens2
human-centric
vision-transformer

Sapiens2

Sapiens2 is a family of high-resolution vision transformers pretrained on 1 billion human images β€” designed for human-centric tasks such as pose estimation, body-part segmentation, surface normals, and pointmaps.

This is the index repository: each variant lives in its own model repo (linked below).

Pretrained Backbones

Model Params Repository
Sapiens2-0.1B 0.114 B facebook/sapiens2-pretrain-0.1b
Sapiens2-0.4B 0.398 B facebook/sapiens2-pretrain-0.4b
Sapiens2-0.8B 0.818 B facebook/sapiens2-pretrain-0.8b
Sapiens2-1B 1.462 B facebook/sapiens2-pretrain-1b
Sapiens2-1B (4K) 1.607 B facebook/sapiens2-pretrain-1b-4k
Sapiens2-5B 5.071 B facebook/sapiens2-pretrain-5b

Task Checkpoints

Pose Estimation

Model Repository
Sapiens2-0.4B facebook/sapiens2-pose-0.4b
Sapiens2-0.8B facebook/sapiens2-pose-0.8b
Sapiens2-1B facebook/sapiens2-pose-1b
Sapiens2-5B facebook/sapiens2-pose-5b

Body-Part Segmentation

Model Repository
Sapiens2-0.4B facebook/sapiens2-seg-0.4b
Sapiens2-0.8B facebook/sapiens2-seg-0.8b
Sapiens2-1B facebook/sapiens2-seg-1b
Sapiens2-5B facebook/sapiens2-seg-5b

Surface Normal Estimation

Pointmap Estimation

License

Released under the Sapiens2 License.

Citation

@article{khirodkarsapiens2,
  title={Sapiens2},
  author={Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan and Su, Zhaoen and Saito, Shunsuke},
  journal={arXiv preprint arXiv:2604.21681},
  year={2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using facebook/sapiens2 4

Collection including facebook/sapiens2

Paper for facebook/sapiens2