Llama-3.2-3B GRPO on GSM8K

  • Developed by: colesmcintosh
  • License: apache-2.0
  • Finetuned from model: unsloth/Llama-3.2-3B-Instruct

This Llama 3.2 3B model was fine-tuned using Group Relative Policy Optimization (GRPO) on the GSM8K dataset for improved mathematical reasoning capabilities. It was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
17
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for colesmcintosh/llama-3.2-3B-gsm8k

Finetuned
(562)
this model