Trained ExpRL checkpoints. Paper link: https://arxiv.org/abs/2606.17024
Violet Xiang PRO
violetxi
AI & ML interests
None yet
Recent Activity
upvoted a paper about 6 hours ago
GLM-5: from Vibe Coding to Agentic Engineering updated a model about 7 hours ago
violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_bs256_lr5e-6 published a model about 7 hours ago
violetxi/opd_tooluse_qwen3-4b_trained_teacher_forward_kl_bs256_lr5e-6