Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.10874

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16 • 71
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 88
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98

research-catchup

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1 • 32
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 238
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13 • 53

Research and ideas

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14 • 34
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13 • 15

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7 • 11
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published Aug 16 • 14

TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

Paper • 2508.04324 • Published Aug 6 • 11
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 238
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models

Paper • 2508.03363 • Published Aug 5 • 1
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 134

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97

TsinghuaC3I/SSRL

Preview • Updated Aug 5 • 194 • 2
TsinghuaC3I/Llama-3.1-8B-Instruct-SSRL

Text Generation • 8B • Updated Aug 5 • 6
TsinghuaC3I/Llama-3.2-3B-Instruct-SSRL

Text Generation • 4B • Updated Aug 5 • 7
TsinghuaC3I/Qwen2.5-7B-Instruct-SSRL

Text Generation • 8B • Updated Aug 5 • 6

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 138
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7 • 11
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published Aug 16 • 14

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16 • 71
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 88
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98

TempFlow-GRPO: When Timing Matters for GRPO in Flow Models

Paper • 2508.04324 • Published Aug 6 • 11
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97

research-catchup

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1 • 32
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 238
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 238
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models

Paper • 2508.03363 • Published Aug 5 • 1
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 134

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13 • 53

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97

Research and ideas

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 98
A Survey on Diffusion Language Models

Paper • 2508.10875 • Published Aug 14 • 34
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 192
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13 • 15

TsinghuaC3I/SSRL

Preview • Updated Aug 5 • 194 • 2
TsinghuaC3I/Llama-3.1-8B-Instruct-SSRL

Text Generation • 8B • Updated Aug 5 • 6
TsinghuaC3I/Llama-3.2-3B-Instruct-SSRL

Text Generation • 4B • Updated Aug 5 • 7
TsinghuaC3I/Qwen2.5-7B-Instruct-SSRL

Text Generation • 8B • Updated Aug 5 • 6

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs