TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published 13 days ago • 19
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation Paper • 2511.19320 • Published Nov 24 • 42
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5 • 52
CaRe Collection CaReBench data, CaRe models and all the contrastively trained MLLMs (including InternVL2, MiniCPM-V 2.6, LLaVA NeXT Video, Qwen2-VL and Tariser). • 6 items • Updated Mar 17 • 1
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper • 2410.23090 • Published Oct 30, 2024 • 55