R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
kevinpro
AI & ML interests
Reasoning, Chain of Thoughts, Alignment, Factual Consistency, Summarization
Recent Activity
liked a dataset 20 days ago
BAAI/Chinese-LiPS liked a dataset 20 days ago
PleIAs/YouTube-Commons new activity about 1 month ago
mispeech/GLAP:Model WeightOrganizations
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
- Running5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated • 9 -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 115
R-PRM
R-PRM: Reasoning-Driven Process Reward Modeling
MAPO: Multilingual Reasoning with Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment‑as‑Preference
Optimization
- Running5
Open Multilingual Reasoning Leaderboard
🦊5Display and search a leaderboard of math models
-
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Paper • 2401.06838 • Published -
kevinpro/MNumGLUESub
Updated • 9 -
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 115
models 15
kevinpro/R-PRM-7B-DPO
Text Generation • 8B • Updated • 5 • 3
kevinpro/Hydra-LLaMA3-8B-0531-preview-Q4_K_M-GGUF
Text Generation • 8B • Updated • 8
kevinpro/MistralMathOctopus-7B
Text Generation • 7B • Updated • 953
kevinpro/MetaMathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 115
kevinpro/MathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated • 4
kevinpro/MetaMathOctopus-13B
Text Generation • 13B • Updated • 7
kevinpro/MetaMathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated • 2
kevinpro/MetaMathOctopus-7B
Text Generation • 7B • Updated • 4
kevinpro/MathOctopus-MAPO-DPO-13B
Text Generation • 13B • Updated • 2
kevinpro/MistralMathOctopus-MAPO-DPO-7B
Text Generation • 7B • Updated