Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents Paper • 2605.29447 • Published 10 days ago • 20
Data-Gouv-FR/caracteristiques-et-localisation-des-stations-de-recharge-supercharger-tesla Viewer • Updated 7 days ago • 117 • 55 • 1
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum Paper • 2604.25907 • Published Apr 28 • 4
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 327
Graph-Based Chain-of-Thought Pruning for Reducing Redundant Reflections in Reasoning LLMs Paper • 2604.05643 • Published Apr 7 • 13
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 632