VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization Paper • 2606.02564 • Published 1 day ago • 20
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published 8 days ago • 20
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Paper • 2605.20682 • Published 14 days ago • 83
WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild Paper • 2605.01018 • Published May 1 • 9
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published 22 days ago • 11
Elucidating the SNR-t Bias of Diffusion Probabilistic Models Paper • 2604.16044 • Published Apr 17 • 73
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 242