Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3 • 3
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models Paper • 2406.14952 • Published Jun 21, 2024
Reflection-Bench: probing AI intelligence with reflection Paper • 2410.16270 • Published Oct 21, 2024 • 6