In a Training Loop 🔄

4 9 2

Zihan Ma

MichaelErchi

https://mazihan880.github.io/

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

opencompass/CodeForce_SAGA:Update README.md

authored a paper 21 days ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

authored a paper 21 days ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

View all activity

Organizations

New activity in opencompass/CodeForce_SAGA 2 days ago

Update README.md

#4 opened 2 days ago by

MichaelErchi

authored 2 papers 21 days ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published 29 days ago • 2

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published 23 days ago • 15

upvoted 2 papers 21 days ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published 23 days ago • 15

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published 29 days ago • 2

authored a paper 4 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 256

upvoted 2 papers 4 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 256

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published Aug 5 • 37

liked 2 datasets 4 months ago

opencompass/CodeCompass

Updated Aug 1 • 98 • 1

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1 • 5.57k • 428 • 1

New activity in opencompass/CodeForce_SAGA 4 months ago

Update metadata: task category and add library name

#2 opened 5 months ago by

nielsr

New activity in opencompass/CodeCompass 4 months ago

Improve dataset card: Update task category, add library_name, link paper

#1 opened 5 months ago by

nielsr

updated 2 datasets 5 months ago

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1 • 5.57k • 428 • 1

opencompass/CodeCompass

Updated Aug 1 • 98 • 1

published a dataset 5 months ago

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1 • 5.57k • 428 • 1

authored a paper 5 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9 • 28

upvoted a paper 5 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9 • 28

commented a paper 5 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9 • 28 •

updated 2 datasets 5 months ago

opencompass/CodeCompass

Updated Aug 1 • 98 • 1

opencompass/CodeCompass

Updated Aug 1 • 98 • 1

Zihan Ma

AI & ML interests

Recent Activity

Organizations

MichaelErchi's activity

Update README.md

Update metadata: task category and add library name

Improve dataset card: Update task category, add library_name, link paper