Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards Paper • 2602.10231 • Published Feb 10 • 13