Llama-3.2-3B GRPO on GSM8K
- Developed by: colesmcintosh
- License: apache-2.0
- Finetuned from model: unsloth/Llama-3.2-3B-Instruct
This Llama 3.2 3B model was fine-tuned using Group Relative Policy Optimization (GRPO) on the GSM8K dataset for improved mathematical reasoning capabilities. It was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 17
Model tree for colesmcintosh/llama-3.2-3B-gsm8k
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct
