-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
upvoted an article about 7 hours ago
Welcome Gemma 4: Frontier multimodal intelligence on device updated a model 21 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 published a model 21 days ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4