davidlms commited on
Commit
b4b0163
·
verified ·
1 Parent(s): 83e1580

Add model-index with benchmark evaluations

Browse files

Added structured evaluation results from README benchmark tables:

**Reasoning Benchmarks:**
- AIME25: 0.721
- AIME24: 0.775
- GPQA Diamond: 0.534
- LiveCodeBench: 0.548

**Instruct Benchmarks:**
- Arena Hard: 0.305
- WildBench: 56.8
- MATH Maj@1: 0.830
- MM MTBench: 7.83

**Base Model Benchmarks:**
- Multilingual MMLU: 0.652
- MATH CoT 2-Shot: 0.601
- AGIEval 5-shot: 0.511
- MMLU Redux 5-shot: 0.735
- MMLU 5-shot: 0.707
- TriviaQA 5-shot: 0.592

Total: 14 benchmarks across reasoning, instruction-following, and base capabilities.

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: PR #6 only adds the `transformers` tag and doesn't conflict with this benchmark metadata addition.

Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -16,11 +16,82 @@ license: apache-2.0
16
  inference: false
17
  base_model:
18
  - mistralai/Ministral-3-3B-Base-2512
19
- extra_gated_description: >-
20
- If you want to learn more about how we process your personal data, please read
21
- our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
22
  tags:
23
  - mistral-common
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  # Ministral 3 3B Instruct 2512
 
16
  inference: false
17
  base_model:
18
  - mistralai/Ministral-3-3B-Base-2512
19
+ extra_gated_description: If you want to learn more about how we process your personal
20
+ data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 
21
  tags:
22
  - mistral-common
23
+ model-index:
24
+ - name: Ministral-3-3B-Instruct-2512
25
+ results:
26
+ - task:
27
+ type: text-generation
28
+ dataset:
29
+ name: Reasoning Benchmarks
30
+ type: benchmark
31
+ metrics:
32
+ - name: AIME25
33
+ type: aime_2025
34
+ value: 0.721
35
+ - name: AIME24
36
+ type: aime_2024
37
+ value: 0.775
38
+ - name: GPQA Diamond
39
+ type: gpqa_diamond
40
+ value: 0.534
41
+ - name: LiveCodeBench
42
+ type: live_code_bench
43
+ value: 0.548
44
+ source:
45
+ name: Model README - Reasoning Benchmarks
46
+ url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
47
+ - task:
48
+ type: text-generation
49
+ dataset:
50
+ name: Instruct Benchmarks
51
+ type: benchmark
52
+ metrics:
53
+ - name: Arena Hard
54
+ type: arena_hard
55
+ value: 0.305
56
+ - name: WildBench
57
+ type: wild_bench
58
+ value: 56.8
59
+ - name: MATH Maj@1
60
+ type: math_maj1
61
+ value: 0.83
62
+ - name: MM MTBench
63
+ type: mm_mtbench
64
+ value: 7.83
65
+ source:
66
+ name: Model README - Instruct Benchmarks
67
+ url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
68
+ - task:
69
+ type: text-generation
70
+ dataset:
71
+ name: Base Model Benchmarks
72
+ type: benchmark
73
+ metrics:
74
+ - name: Multilingual MMLU
75
+ type: multilingual_mmlu
76
+ value: 0.652
77
+ - name: MATH CoT 2-Shot
78
+ type: math_cot_2shot
79
+ value: 0.601
80
+ - name: AGIEval 5-shot
81
+ type: agieval_5shot
82
+ value: 0.511
83
+ - name: MMLU Redux 5-shot
84
+ type: mmlu_redux_5shot
85
+ value: 0.735
86
+ - name: MMLU 5-shot
87
+ type: mmlu_5shot
88
+ value: 0.707
89
+ - name: TriviaQA 5-shot
90
+ type: triviaqa_5shot
91
+ value: 0.592
92
+ source:
93
+ name: Model README - Base Model Benchmarks
94
+ url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
95
  ---
96
 
97
  # Ministral 3 3B Instruct 2512