Submitted by Yi Yang (SII) 12 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight DENG Lab @ SJTU 49 2