VERL Math Datasets

sungyub 's Collections

updated Nov 9

High-quality math reasoning datasets in VERL format

sungyub/math-verl-unified

Viewer • Updated Nov 9 • 2.27M • 62 • 1

Note Unified collection of 2.1M deduplicated math problems (from 27.9M originals). Combines all 8 datasets below with 92.36% deduplication.
sungyub/mathx-5m-verl

Viewer • Updated Nov 9 • 1.45M • 11

Note Largest contributor (67.6%). From XenArcAI/MathX-5M with 94.6% deduplication.
sungyub/eurus-2-math-verl

Viewer • Updated Nov 9 • 412k • 21

Note Second largest (14.9%). Diverse math reasoning from PRIME-RL/Eurus-2-RL-Data.
sungyub/big-math-rl-verl

Viewer • Updated Nov 9 • 242k • 23

Note Third largest (10.5%). Includes solve_rate and domain metadata for curriculum learning.
sungyub/deepmath-103k-verl

Viewer • Updated Nov 9 • 102k • 7
sungyub/skywork-or1-math-verl

Viewer • Updated Nov 9 • 103k • 36

Note Includes model difficulty ratings. From Skywork/Skywork-OR1-RL-Data.
sungyub/openr1-math-verl

Viewer • Updated Nov 9 • 184k • 135

Note Highest quality (0.44% duplicates). Includes problem_type metadata.
sungyub/orz-math-72k-verl

Viewer • Updated Nov 9 • 46.4k • 32

Note Medium-scale dataset. Originally 72K with 33.5% duplicates removed.
sungyub/deepscaler-preview-verl

Viewer • Updated Nov 9 • 38k • 67

Note Competition math (AIME, AMC, Omni-MATH). High preservation rate (94%).
sungyub/dapo-math-17k-verl

Viewer • Updated Nov 9 • 17.2k • 24

Note Smallest but fully preserved. From BytedTsinghua-SIA/DAPO-Math-17k.