What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models Paper • 2601.06165 • Published 7 days ago • 14
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 7 days ago • 4
Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark Paper • 2509.17807 • Published Sep 22, 2025 • 1
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 8 days ago • 6
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 8 days ago • 6
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 8 days ago • 6
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 8 days ago • 5.92k • 129 • 10
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 8 days ago • 5.92k • 129 • 10
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 7 days ago • 4
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 7 days ago • 4