Add model-index with benchmark evaluations

Added structured evaluation results from README benchmark tables:

**Reasoning Benchmarks:**
- AIME25: 0.721
- AIME24: 0.775
- GPQA Diamond: 0.534
- LiveCodeBench: 0.548

**Instruct Benchmarks:**
- Arena Hard: 0.305
- WildBench: 56.8
- MATH Maj@1: 0.830
- MM MTBench: 7.83

**Base Model Benchmarks:**
- Multilingual MMLU: 0.652
- MATH CoT 2-Shot: 0.601
- AGIEval 5-shot: 0.511
- MMLU Redux 5-shot: 0.735
- MMLU 5-shot: 0.707
- TriviaQA 5-shot: 0.592

Total: 14 benchmarks across reasoning, instruction-following, and base capabilities.

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: PR #6 only adds the `transformers` tag and doesn't conflict with this benchmark metadata addition.

Files changed (1) hide show

README.md +74 -3

README.md CHANGED Viewed

@@ -16,11 +16,82 @@ license: apache-2.0
 inference: false
 base_model:
 - mistralai/Ministral-3-3B-Base-2512
-extra_gated_description: >-
-  If you want to learn more about how we process your personal data, please read
-  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 tags:
 - mistral-common
 ---
 # Ministral 3 3B Instruct 2512

 inference: false
 base_model:
 - mistralai/Ministral-3-3B-Base-2512
+extra_gated_description: If you want to learn more about how we process your personal
+  data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 tags:
 - mistral-common
+model-index:
+- name: Ministral-3-3B-Instruct-2512
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      name: Reasoning Benchmarks
+      type: benchmark
+    metrics:
+    - name: AIME25
+      type: aime_2025
+      value: 0.721
+    - name: AIME24
+      type: aime_2024
+      value: 0.775
+    - name: GPQA Diamond
+      type: gpqa_diamond
+      value: 0.534
+    - name: LiveCodeBench
+      type: live_code_bench
+      value: 0.548
+    source:
+      name: Model README - Reasoning Benchmarks
+      url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
+  - task:
+      type: text-generation
+    dataset:
+      name: Instruct Benchmarks
+      type: benchmark
+    metrics:
+    - name: Arena Hard
+      type: arena_hard
+      value: 0.305
+    - name: WildBench
+      type: wild_bench
+      value: 56.8
+    - name: MATH Maj@1
+      type: math_maj1
+      value: 0.83
+    - name: MM MTBench
+      type: mm_mtbench
+      value: 7.83
+    source:
+      name: Model README - Instruct Benchmarks
+      url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
+  - task:
+      type: text-generation
+    dataset:
+      name: Base Model Benchmarks
+      type: benchmark
+    metrics:
+    - name: Multilingual MMLU
+      type: multilingual_mmlu
+      value: 0.652
+    - name: MATH CoT 2-Shot
+      type: math_cot_2shot
+      value: 0.601
+    - name: AGIEval 5-shot
+      type: agieval_5shot
+      value: 0.511
+    - name: MMLU Redux 5-shot
+      type: mmlu_redux_5shot
+      value: 0.735
+    - name: MMLU 5-shot
+      type: mmlu_5shot
+      value: 0.707
+    - name: TriviaQA 5-shot
+      type: triviaqa_5shot
+      value: 0.592
+    source:
+      name: Model README - Base Model Benchmarks
+      url: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
 ---
 # Ministral 3 3B Instruct 2512