-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 219 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 21 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 4 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2412.17256
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
Revisiting In-Context Learning with Long Context Language Models
Paper • 2412.16926 • Published • 32
-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 28 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 42
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19
-
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 219 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 21 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 4 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 1
-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 140
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 84 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 28 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 42
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 36 -
Revisiting In-Context Learning with Long Context Language Models
Paper • 2412.16926 • Published • 32
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19
-
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47 -
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 88 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8