Add GPQA evaluation result
#241
by
burtenshaw HF Staff - opened
Evaluation Results
This PR adds structured evaluation results using the new .eval_results/ format.
What This Enables
- Model Page: Results appear on the model page with benchmark links
- Leaderboards: Scores are aggregated into benchmark dataset leaderboards
- Verification: Support for cryptographic verification of evaluation runs
Format Details
Results are stored as YAML in .eval_results/ folder. See the Eval Results Documentation for the full specification.
Generated by community-evals
For those asking about API access — I've been using Crazyrouter as a unified gateway. One API key, OpenAI SDK compatible. Works well for testing different models without managing multiple accounts.
