SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees Paper • 2602.06554 • Published 14 days ago • 5
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 11 days ago • 259
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 10 days ago • 190
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 15 days ago • 320
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published 9 days ago • 28
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 8 days ago • 57
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published 16 days ago • 34
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 19 days ago • 40
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 22 days ago • 154
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published 21 days ago • 191
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 36
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 64
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published Dec 24, 2025 • 4
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published Dec 23, 2025 • 16
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published Dec 25, 2025 • 4
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119