Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper ⢠2603.21986 ⢠Published Mar 23 ⢠124
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper ⢠2604.08995 ⢠Published 16 days ago ⢠48
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper ⢠2604.14268 ⢠Published 11 days ago ⢠110
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper ⢠2604.19636 ⢠Published 5 days ago ⢠82
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper ⢠2604.08626 ⢠Published 17 days ago ⢠240
ELT: Elastic Looped Transformers for Visual Generation Paper ⢠2604.09168 ⢠Published 16 days ago ⢠19
view article Article The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics Mar 16 ⢠29
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper ⢠2603.12255 ⢠Published Mar 12 ⢠91
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper ⢠2602.14041 ⢠Published Feb 15 ⢠53
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper ⢠2512.16793 ⢠Published Dec 18, 2025 ⢠76
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper ⢠2512.13604 ⢠Published Dec 15, 2025 ⢠76
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards Paper ⢠2512.00473 ⢠Published Nov 29, 2025 ⢠27
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper ⢠2512.08829 ⢠Published Dec 9, 2025 ⢠21
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper ⢠2512.09247 ⢠Published Dec 10, 2025 ⢠51
Composing Concepts from Images and Videos via Concept-prompt Binding Paper ⢠2512.09824 ⢠Published Dec 10, 2025 ⢠28
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper ⢠2512.09363 ⢠Published Dec 10, 2025 ⢠74
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper ⢠2512.07951 ⢠Published Dec 8, 2025 ⢠51
Light-X: Generative 4D Video Rendering with Camera and Illumination Control Paper ⢠2512.05115 ⢠Published Dec 4, 2025 ⢠11