A collection of benchmarks for evaluating LMs or VLMs under multi-turn interaction
Young-Jun Lee PRO
passing2961
AI & ML interests
Social Dialogue System, Multi-Modal Dialogue
Recent Activity
upvoted a paper about 12 hours ago
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation upvoted a paper about 12 hours ago
Agentic Code Reasoning upvoted a paper about 12 hours ago
Tool Verification for Test-Time Reinforcement Learning