Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published Sep 30 • 19
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 70
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
Running 3.55k The Ultra-Scale Playbook 🌌 3.55k The ultimate guide to training LLM on large GPU Clusters
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 40
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 41