Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels Paper • 2601.21268 • Published 5 days ago • 2
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning Paper • 2601.19001 • Published 7 days ago • 3
Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units Paper • 2601.21996 • Published 4 days ago • 4
ECO: Quantized Training without Full-Precision Master Weights Paper • 2601.22101 • Published 4 days ago • 6
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices Paper • 2601.21579 • Published 5 days ago • 6
Beyond Imitation: Reinforcement Learning for Active Latent Planning Paper • 2601.21598 • Published 5 days ago • 9
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 4 days ago • 10
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Paper • 2601.21590 • Published 5 days ago • 12
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 5 days ago • 14
Language-based Trial and Error Falls Behind in the Era of Experience Paper • 2601.21754 • Published 4 days ago • 16
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 5 days ago • 38
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization Paper • 2601.21358 • Published 5 days ago • 5
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 3 days ago • 34
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Paper • 2601.22642 • Published 4 days ago • 7
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment Paper • 2601.20218 • Published 6 days ago • 10
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization Paper • 2601.22491 • Published 4 days ago • 11
FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation Paper • 2601.23182 • Published 3 days ago • 18
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 3 days ago • 12