[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
Ye Liu
yeliudev
AI & ML interests
Vision & Language
Recent Activity
upvoted a paper 6 days ago
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents updated a Space 13 days ago
PolyU-ChenLab/Video-Highlights upvoted a paper about 1 month ago
Mixture-of-Depths AttentionOrganizations
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
- Running on ZeroAgents6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 118 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 51 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 132 • 2
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
VideoMind
[ICLR 2026] VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
- Running on ZeroAgents37
VideoMind 2B
💡37A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
-
yeliudev/VideoMind-2B
Video-Text-to-Text • Updated • 25 • 2 -
yeliudev/VideoMind-7B
Video-Text-to-Text • Updated • 45 • 4 -
yeliudev/VideoMind-Dataset
Preview • Updated • 4.36k • 21
UniPixel
[NeurIPS 2025] UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
- Running on ZeroAgents6
UniPixel
🔮6An MLLM for Unified Object Referring and Segmentation
-
PolyU-ChenLab/UniPixel-3B
Video-Text-to-Text • 4B • Updated • 118 • 3 -
PolyU-ChenLab/UniPixel-7B
Video-Text-to-Text • 8B • Updated • 51 • 1 -
PolyU-ChenLab/UniPixel-SFT-1M
Preview • Updated • 132 • 2
E.T. Bench
[NeurIPS 2024] E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
R2-Tuning
[ECCV 2024] R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding