VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 25 days ago • 15
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 20 days ago • 74
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21 • 21
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26 • 9
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning Paper • 2509.23768 • Published Sep 28 • 48
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2 • 14
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation Paper • 2506.23121 • Published Jun 29 • 2