Added Evaluation Benchmarks to Metadata

#34
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -4,6 +4,60 @@ library_name: transformers
4
  base_model:
5
  - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
  base_model_relation: finetune
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
  # DeepSeek-V3.2-Exp
9
 
 
4
  base_model:
5
  - deepseek-ai/DeepSeek-V3.2-Exp-Base
6
  base_model_relation: finetune
7
+ model-index:
8
+ - name: DeepSeek-V3.2-Exp
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ dataset:
13
+ name: Benchmarks
14
+ type: benchmark
15
+ metrics:
16
+ - name: MMLU-Pro
17
+ type: mmlu-pro
18
+ value: 85.0
19
+ - name: GPQA-Diamond
20
+ type: gpqa-diamond
21
+ value: 79.9
22
+ - name: Humanity's Last Exam
23
+ type: humanity's_last_exam
24
+ value: 19.8
25
+ - name: LiveCodeBench
26
+ type: livecodebench
27
+ value: 74.1
28
+ - name: AIME 2025
29
+ type: aime_2025
30
+ value: 89.3
31
+ - name: HMMT 2025
32
+ type: hmmt_2025
33
+ value: 83.6
34
+ - name: Codeforces
35
+ type: codeforces
36
+ value: 2121.0
37
+ - name: Aider-Polyglot
38
+ type: aider-polyglot
39
+ value: 74.5
40
+ - name: BrowseComp
41
+ type: browsecomp
42
+ value: 40.1
43
+ - name: BrowseComp-zh
44
+ type: browsecomp-zh
45
+ value: 47.9
46
+ - name: SimpleQA
47
+ type: simpleqa
48
+ value: 97.1
49
+ - name: SWE Verified
50
+ type: swe_verified
51
+ value: 67.8
52
+ - name: SWE-bench Multilingual
53
+ type: swe-bench_multilingual
54
+ value: 57.9
55
+ - name: Terminal-bench
56
+ type: terminal-bench
57
+ value: 37.7
58
+ source:
59
+ name: Model README
60
+ url: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
61
  ---
62
  # DeepSeek-V3.2-Exp
63