refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 10 days ago