KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints Paper • 2510.19316 • Published Oct 22 • 11
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation Paper • 2509.23866 • Published Sep 28 • 13
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Paper • 2412.15606 • Published Dec 20, 2024 • 2
LongViTU: Instruction Tuning for Long-Form Video Understanding Paper • 2501.05037 • Published Jan 9 • 1
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Paper • 2407.11522 • Published Jul 16, 2024 • 9
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Paper • 2401.09340 • Published Jan 17, 2024 • 21