Spaces:
Sleeping
Sleeping
| title: ATLAS Benchmark | |
| emoji: 🧪 | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: ATLAS for Frontier Scientific Benchmark | |
| sdk_version: 5.43.1 | |
| hf_oauth: true | |
| tags: | |
| - leaderboard | |
| - science | |
| - benchmark | |
| - evaluation | |
| # ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning | |
| ATLAS is a high-difficulty, multidisciplinary benchmark for frontier scientific reasoning. It is designed to evaluate the capabilities of large language models (LLMs) in scientific reasoning across seven core scientific fields covering the key domains of AI for Science (AI4S): | |
| - Mathematics | |
| - Physics | |
| - Chemistry | |
| - Biology | |
| - Computer Science | |
| - Earth Science | |
| - Materials Science | |