refactor: Update task configurations and grading logic for improved scoring and consistency dccaaac ajaxwin commited on 9 days ago
refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 10 days ago
refactor: Update ActionType to include costs and modified grader for task 1 5235476 ajaxwin commited on 11 days ago
refactor: Rename submit_function to submit and removes asserts from eval.py 277ec6e ajaxwin commited on 13 days ago
refactor: Update grading logic and submission handling across tasks for improved accuracy and consistency cfae7a7 ajaxwin commited on 13 days ago
fix: Handle optional request body in reset endpoint and set default task ID a503619 ajaxwin commited on 14 days ago
fix: Update API responses to return JSON format and remove deprecated file references cfd3cfa ajaxwin commited on 14 days ago
fix: Update file paths and ensure model loading in PropertyRetriever 45bd962 ajaxwin commited on 15 days ago
fix import paths in app.py to reflect correct module structure 0304fd3 ajaxwin commited on 15 days ago