launch
/

ThinkPRM-1.5B

Text Generation

generative reward model

process supervision

chain-of-thought

code verification

text-generation-inference

Model card Files Files and versions

mkhalifa commited on Apr 25

Commit

593f8de

·

verified ·

1 Parent(s): b1687f2

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -15,6 +15,9 @@ tags:
 ThinkPRM-14B is a generative Process Reward Model (PRM) based on the R1-Distill-Qwen-14B architecture. It is fine-tuned to perform step-by-step verification of reasoning processes (like mathematical solutions) by generating an explicit verification chain-of-thought (CoT) that involves labeling every step. It is designed to be highly data-efficient, requiring significantly less supervision data than traditional discriminative PRMs while achieving strong performance.
 ## Model Details
 ### Model Description

 ThinkPRM-14B is a generative Process Reward Model (PRM) based on the R1-Distill-Qwen-14B architecture. It is fine-tuned to perform step-by-step verification of reasoning processes (like mathematical solutions) by generating an explicit verification chain-of-thought (CoT) that involves labeling every step. It is designed to be highly data-efficient, requiring significantly less supervision data than traditional discriminative PRMs while achieving strong performance.
+Here's an example of the model output:
 ## Model Details
 ### Model Description