Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published 6 days ago • 32
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation Paper • 2601.18533 • Published Jan 26
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 6 days ago • 51
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 6 days ago • 51
CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation Paper • 2601.11096 • Published Jan 16 • 8
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31, 2025 • 55
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge Paper • 2502.12501 • Published Feb 18, 2025 • 6
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published Dec 21, 2024 • 8
Collaborative Performance Prediction for Large Language Models Paper • 2407.01300 • Published Jul 1, 2024 • 2
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7, 2024 • 14