3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability Paper • 2409.00119 • Published Aug 28, 2024 • 1
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published Oct 6 • 15
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation Paper • 2505.06027 • Published May 9 • 18
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11 • 89
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31 • 39