Llama-3.2-3B GRPO on GSM8K

Developed by: colesmcintosh
License: apache-2.0
Finetuned from model: unsloth/Llama-3.2-3B-Instruct

This Llama 3.2 3B model was fine-tuned using Group Relative Policy Optimization (GRPO) on the GSM8K dataset for improved mathematical reasoning capabilities. It was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 17

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for colesmcintosh/llama-3.2-3B-gsm8k

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Finetuned

(562)

this model