-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 162 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 320 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 25 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15
laner ten
that113
·
AI & ML interests
None yet
Recent Activity
upvoted an article 28 days ago
Proximal Policy Optimization (PPO) upvoted a collection 3 months ago
RLHF Papers updated a collection 3 months ago
re paperOrganizations
None yet
d
- Running3.74k
The Ultra-Scale Playbook
🌌3.74kThe ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Robotics • 9B • Updated • 991 • 414
re paper
-
Scaling RL to Long Videos
Paper • 2507.07966 • Published • 162 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 320 -
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Paper • 2507.14111 • Published • 25 -
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15
d
- Running3.74k
The Ultra-Scale Playbook
🌌3.74kThe ultimate guide to training LLM on large GPU Clusters
-
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Paper • 2504.02587 • Published • 32 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 71 -
microsoft/Magma-8B
Robotics • 9B • Updated • 991 • 414
models 0
None public yet
datasets 0
None public yet