Static quant of https://huggingface.co/allenai/Olmo-3-7B-Instruct

Model Description

  • Developed by: Allen Institute for AI (Ai2)
  • Model type: a Transformer style autoregressive language model.
  • Language(s) (NLP): English
  • License: This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
  • Contact: Technical inquiries: olmo@allenai.org. Press: press@allenai.org
  • Date cutoff: Dec. 2024.

Model Sources

Evaluation

Skill Benchmark Olmo 3 Instruct 7B SFT Olmo 3 Instruct 7B DPO Olmo3 Instruct 7B Qwen 3 8B (no reasoning) Qwen 3 VL 8B Instruct Qwen 2.5 7B Olmo 2 7B Instruct Apertus 8B Instruct Granite 3.3 8B Instruct
Math MATH 65.1 79.6 87.3 82.3 91.6 71.0 30.1 21.9 67.3
AIME 2024 6.7 23.5 44.3 26.2 55.1 11.3 1.3 0.5 7.3
AIME 2025 7.2 20.4 32.5 21.7 43.3 6.3 0.4 0.2 6.3
OMEGA 14.4 22.8 28.9 20.5 32.3 13.7 5.2 5.0 10.7
Reasoning BigBenchHard 51.0 69.3 71.2 73.7 85.6 68.8 43.8 42.2 61.2
ZebraLogic 18.0 28.4 32.9 25.4 64.3 10.7 5.3 5.3 17.6
AGI Eval English 59.2 64.0 64.4 76.0 84.5 69.8 56.1 50.8 64.0
Coding HumanEvalPlus 69.8 72.9 77.2 79.8 82.9 74.9 25.8 34.4 64.0
MBPP+ 56.5 55.9 60.2 64.4 66.3 62.6 40.7 42.1 54.0
LiveCodeBench v3 20.0 18.8 29.5 53.2 55.9 34.5 7.2 7.8 11.5
IF IFEval 81.7 82.0 85.6 86.3 87.8 73.4 72.2 71.4 77.5
IFBench 27.4 29.3 32.3 29.3 34.0 28.4 26.7 22.1 22.3
Knowledge MMLU 67.1 69.1 69.1 80.4 83.6 77.2 61.6 62.7 63.5
QA PopQA 16.5 20.7 14.1 20.4 26.5 21.5 25.5 25.5 28.9
GPQA 30.0 37.9 40.4 44.6 51.1 35.6 31.3 28.8 33.0
Chat AlpacaEval 2 LC 21.8 43.3 40.9 49.8 73.5 23.0 18.3 8.1 28.6
Tool Use SimpleQA 74.2 79.8 79.3 79.0 90.3 78.0 – – –
LitQA2 38.0 43.3 38.2 39.6 30.7 29.8 – – –
BFCL 48.9 49.6 49.8 60.2 66.2 55.8 – – –
Safety Safety 89.2 90.2 87.3 78.0 80.2 73.4 93.1 72.2 73.7

Model Details

Stage 1: SFT

Stage 2:DPO

Stage 3: RLVR

  • reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
  • Datasets: Dolci-Think-RL-7B, Dolci-Instruct-RL-7B

Inference & Recommended Settings

We evaluated our models on the following settings. We also recommend using them for generation:

  • temperature: 0.6
  • top_p: 0.95
  • max_tokens: 32768
Downloads last month
86
GGUF
Model size
7B params
Architecture
olmo2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for firmanda/Olmo-3-7B-Think-GGUF

Quantized
(28)
this model