NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 62
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 27 days ago • 74
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 27 days ago • 4