MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs Paper • 2511.07250 • Published 28 days ago • 17
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12 • 46
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Paper • 2507.22827 • Published Jul 30 • 99
VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning Paper • 2510.10518 • Published Oct 12 • 18
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12 • 46
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper • 2509.04292 • Published Sep 4 • 57
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model Paper • 2504.10068 • Published Apr 14 • 30
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 71