quinn
jwhe
·
AI & ML interests
None yet
Recent Activity
new activity about 3 hours ago
harborframework/parity-experiments:[Parity] CL-bench: codex/gpt-5.2 vs infer_codex.py (50 tasks, 3 trials, MATCHING) authored a paper about 2 months ago
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks