Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published Nov 7 • 52
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings Paper • 2511.05017 • Published Nov 7 • 7
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency Paper • 2510.25897 • Published Oct 29 • 16
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 175
Large Language Models Cannot Self-Correct Reasoning Yet Paper • 2310.01798 • Published Oct 3, 2023 • 36
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models Paper • 2308.13437 • Published Aug 25, 2023 • 4
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities Paper • 2308.12966 • Published Aug 24, 2023 • 11
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts Paper • 2309.15915 • Published Sep 27, 2023 • 2
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants Paper • 2310.00653 • Published Oct 1, 2023 • 3
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models Paper • 2309.09958 • Published Sep 18, 2023 • 19