ZHANGYUXUAN-zR SaylorTwift HF Staff commited on
Commit
b8a9fc5
·
1 Parent(s): 360b49d

Add evaluation results for GPQA, HLE (#22)

Browse files

- Add evaluation results for GPQA, HLE (25f841a1ce34c104d54d5ea52adef53e47976de0)
- Update .eval_results/hle.yaml (22c99c0ca7ae8b010ee0577a7acbe01246d13415)
- Update .eval_results/hle_with_tools.yaml (f7d34fbc5116ab7837d7fa2def080ce402a7dd87)
- Update .eval_results/hle_with_tools.yaml (a185b9b508c09353db5c426449b62cef6fcbddac)
- Update .eval_results/hle_with_tools.yaml (c3a7b733fcb172baa9dee798ad4dd3e08504c32a)
- Update .eval_results/gpqa.yaml (0dbd2351ed12335e8c31bcdb34193925735e6899)
- Update .eval_results/hle_with_tools.yaml (ced4d5301501da279a87495a6c144fd8201bb640)


Co-authored-by: Nathan Habib <SaylorTwift@users.noreply.huggingface.co>

.eval_results/gpqa.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: Idavidrein/gpqa
3
+ task_id: diamond
4
+ value: 86.0
5
+ date: '2026-02-13'
6
+ source:
7
+ url: https://huggingface.co/zai-org/GLM-5
8
+ name: Model Card
.eval_results/hle.yaml ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: cais/hle
3
+ task_id: hle
4
+ value: 30.5
5
+ date: '2026-02-13'
6
+ source:
7
+ url: https://huggingface.co/zai-org/GLM-5
8
+ name: Model Card
9
+ user: SaylorTwift
.eval_results/hle_with_tools.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: cais/hle
3
+ task_id: hle
4
+ value: 50.4
5
+ date: '2026-02-13'
6
+ source:
7
+ url: https://huggingface.co/zai-org/GLM-5
8
+ name: Model Card
9
+ user: SaylorTwift
10
+ notes: "With tools"