On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 2 days ago • 23
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 23 days ago • 132
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 28 days ago • 194
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16 • 8
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations Paper • 2506.13651 • Published Jun 16 • 8
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 77
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback Paper • 2312.00849 • Published Dec 1, 2023 • 12
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 22
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published Feb 24 • 22 • 2
CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding Paper • 2405.02384 • Published May 3, 2024