Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 4 days ago • 49
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 4 days ago • 173
Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search Paper • 2605.20244 • Published 18 days ago • 4
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 14 days ago • 79
bilabila/b-b7_olr_ts10_gru_hib_costdyn_util_w3_sym7_202601_lossq_ms400k_h12 68k • Updated 13 days ago • 383 • 1
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 15 days ago • 169
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes Paper • 2605.15843 • Published 21 days ago • 6
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published 22 days ago • 76
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices Paper • 2605.10933 • Published 25 days ago • 3
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts Paper • 2602.03473 • Published 28 days ago • 11