-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2508.10874
-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
Deep Think with Confidence
Paper • 2508.15260 • Published • 88 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98
-
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Paper • 2508.01059 • Published • 32 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 180 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192 -
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper • 2508.09968 • Published • 15
-
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper • 2508.05547 • Published • 11 -
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
Paper • 2508.10751 • Published • 28 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Paper • 2508.12040 • Published • 14
-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models
Paper • 2508.03363 • Published • 1 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 134
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper • 2508.05547 • Published • 11 -
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
Paper • 2508.10751 • Published • 28 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Paper • 2508.12040 • Published • 14
-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
Deep Think with Confidence
Paper • 2508.15260 • Published • 88 -
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98
-
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Paper • 2508.01059 • Published • 32 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 180 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192
-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models
Paper • 2508.03363 • Published • 1 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 134
-
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
Paper • 2508.07407 • Published • 98 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 192 -
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper • 2508.09968 • Published • 15