Add evaluation results for GPQA, HLE

#22
by SaylorTwift HF Staff - opened

Evaluation Results

This PR adds evaluation results extracted from the Model Card.

Benchmarks:

  • HLE: 30.5
  • HLE: 50.4
  • GPQA: 86.0

Files created:

  • .eval_results/hle.yaml
  • .eval_results/hle_with_tools.yaml
  • .eval_results/gpqa.yaml
ZHANGYUXUAN-zR changed pull request status to merged

Sign up or log in to comment