Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 12 items • Updated 12 days ago • 44
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30 • 33
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 116
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper • 2504.10462 • Published Apr 14 • 15
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 12 items • Updated 12 days ago • 44
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Paper • 2506.03144 • Published Jun 3 • 7
CyberV: Cybernetics for Test-time Scaling in Video Understanding Paper • 2506.07971 • Published Jun 9 • 5
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 12 items • Updated 12 days ago • 44
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21 • 36
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 12 items • Updated 12 days ago • 44