TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 3 days ago • 167
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 11 days ago • 48
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 11 days ago • 262
T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning Paper • 2605.02178 • Published 20 days ago • 10
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 240
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models Paper • 2604.10949 • Published Apr 13 • 40
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 325