arxiv:2604.18564

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Published on Apr 20

· Submitted by

Haoyu wu on Apr 21

The University of Hong Kong

Upvote

Authors:

Haoyu Wu ,

Abstract

MultiWorld is a unified framework for multi-agent multi-view world modeling that achieves accurate multi-agent control while maintaining multi-view consistency through specialized modules for condition handling and global state encoding.

AI-generated summary

Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present MultiWorld, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in parallel for high efficiency. Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action-following ability, and multi-view consistency. Project page: https://multi-world.github.io/

View arXiv page View PDF Project page GitHub 62 Add to collection

Community

Haoyuwu

Paper author Paper submitter about 19 hours ago

We present MultiWorld, a scalable multi-agent, multi-view video world model that generates action-controllable, multi-view-consistent videos for both multi-player games and multi-robot manipulation.