Learning to Repair Lean Proofs from Compiler Feedback Paper • 2602.02990 • Published 15 days ago • 27
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10, 2024 • 68