Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings Paper • 2509.14405 • Published Sep 17 • 2
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans Paper • 2506.22439 • Published May 29 • 3
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17 • 12
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper • 2507.00999 • Published Jul 1 • 1
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations Paper • 2507.13302 • Published Jul 17 • 4
UnMix-NeRF: Spectral Unmixing Meets Neural Radiance Fields Paper • 2506.21884 • Published Jun 27 • 12
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset Paper • 2503.23899 • Published Mar 31
Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction Paper • 2303.14342 • Published Mar 25, 2023
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Paper • 2504.07072 • Published Apr 9 • 9
It's the same but not the same: Do LLMs distinguish Spanish varieties? Paper • 2504.20049 • Published Apr 8
Spanish and LLM Benchmarks: is MMLU Lost in Translation? Paper • 2406.17789 • Published May 28, 2024 • 2
How Stable is Stable Diffusion under Recursive InPainting (RIP)? Paper • 2407.09549 • Published Jun 27, 2024 • 1
Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal Paper • 2408.16012 • Published Aug 16, 2024
Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail? Paper • 2409.15334 • Published Sep 8, 2024 • 1
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published Jan 16 • 33
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 21
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Paper • 2311.16079 • Published Nov 27, 2023 • 19