WeijianQi1999 commited on
Commit
d5645ee
·
1 Parent(s): 6820b1a

upload new data

Browse files
auto_o4-mini_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -5,7 +5,8 @@ Browser Use,gpt-4o-2024-08-06,Browser Use,[OSU NLP](https://arxiv.org/abs/2504.0
5
  Claude Computer Use 3.5,Claude-3-5-sonnet-20241022,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,16.1,8.1,24,2025-5-11,True,,2024-10
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,23.1,6.8,27,2025-5-11,True,,2024-07
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),75.9,41.3,27,47.3,2025-5-11,True,,2025-02
8
- ACT-1,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,[Enhans](https://www.enhans.ai/),53.7,39.2,24.3,39.5,2025-7-16,True,,2025-07
 
9
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
10
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
11
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
 
5
  Claude Computer Use 3.5,Claude-3-5-sonnet-20241022,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,16.1,8.1,24,2025-5-11,True,,2024-10
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,[OSU NLP](https://arxiv.org/abs/2504.01382),51.8,23.1,6.8,27,2025-5-11,True,,2024-07
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,[OSU NLP](https://arxiv.org/abs/2504.01382),75.9,41.3,27,47.3,2025-5-11,True,,2025-02
8
+ ACT-1-20250703,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,[Enhans](https://www.enhans.ai/),53.7,39.2,24.3,39.5,2025-7-16,True,,2025-07
9
+ ACT-1-20250814,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,[Enhans](https://www.enhans.ai/),71.1,52.4,32.4,52.7,2025-8-23,True,,2025-08
10
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
11
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
12
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
human_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -5,4 +5,5 @@ Browser Use,gpt-4o-2024-08-06,Browser Use,OSU NLP,55.4,26.6,8.1,30.0,2025-3-22
5
  Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
8
- ACT-1,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,65.1,46.2,23.0,45.7,2025-7-16
 
 
5
  Claude Computer Use 3.5,claude-3-5-sonnet-20241022,Anthropic,OSU NLP,56.6,20.3,14.9,29.0,2025-3-22
6
  Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
8
+ ACT-1-20250703,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,65.1,46.2,23.0,45.7,2025-7-16
9
+ ACT-1-20250814,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,81.9,54.5,35.1,57.3,2025-8-23