Add evaluation results for GPQA, HLE

## Evaluation Results

This PR adds evaluation results extracted from the Model Card.

**Benchmarks:**
- HLE: 30.5
- HLE: 50.4
- GPQA: 86.0

**Files created:**
- .eval_results/hle.yaml
- .eval_results/hle_with_tools.yaml
- .eval_results/gpqa.yaml

---

Extracted automatically using the [LLM-powered evaluation extractor](https://github.com/huggingface/community-evals).

Files changed (3) hide show

.eval_results/gpqa.yaml +8 -0
.eval_results/hle.yaml +7 -0
.eval_results/hle_with_tools.yaml +8 -0

.eval_results/gpqa.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+- dataset:
+    id: Idavidrein/gpqa
+    task_id: gpqa_diamond
+  value: 86.0
+  date: '2026-02-13'
+  source:
+    url: https://huggingface.co/zai-org/GLM-5
+    name: Model Card

.eval_results/hle.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+- dataset:
+    id: cais/hle
+  value: 30.5
+  date: '2026-02-13'
+  source:
+    url: https://huggingface.co/zai-org/GLM-5
+    name: Model Card

.eval_results/hle_with_tools.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+- dataset:
+    id: cais/hle
+  value: 50.4
+  date: '2026-02-13'
+  source:
+    url: https://huggingface.co/zai-org/GLM-5
+    name: Model Card
+  note: with tools