CodeScout - a OpenHands Collection

OpenHands 's Collections

updated Mar 19

RL-trained code search agents (1.7B, 4B, 14B) that outperform 2–18× larger models using only a Unix terminal. 📄 arxiv.org/abs/2603.17829

Upvote

OpenHands/CodeScout-14B

Text Generation • 15B • Updated Mar 19 • 82 • 2

Note 🏆 CodeScout-14B — strongest model, SOTA on SWE-Bench Verified/Pro/Lite
OpenHands/CodeScout-4B

Text Generation • 4B • Updated Mar 19 • 79 • • 1

Note ⚡ CodeScout-4B — outperforms 8× larger Qwen3-32B across all benchmarks
OpenHands/CodeScout-1.7B

Text Generation • 2B • Updated Mar 19 • 369 • • 1

Note 🔬 CodeScout-1.7B — post-RL checkpoint, outperforms 8× larger Qwen3-14B
OpenHands/CodeScout-1.7B-RFT

Text Generation • 2B • Updated Mar 19 • 36 • • 1

Note 📦 CodeScout-1.7B-RFT — pre-RL (rejection fine-tuned) checkpoint
OpenHands/CodeScout_Training_Rollouts

Viewer • Updated Mar 17 • 54.8k • 60 • 2

Note 🗂️ Training rollouts from SWE-Smith environments
OpenHands/CodeScout_Eval_Rollouts

Viewer • Updated Mar 17 • 12.7k • 38

Note 📊 Evaluation trajectories on SWE-Bench Verified, Pro, and Lite
OpenHands/SWE-smith-py-code-search

Viewer • Updated Mar 17 • 39.3k • 39

Note 🔍 SWE-Smith code search localization targets
OpenHands/SWE-Gym-code-search

Viewer • Updated Mar 17 • 2.32k • 42

Note 🔍 SWE-Gym code search localization targets
OpenHands/SWE-rebench-code-search

Viewer • Updated Mar 17 • 17.6k • 22

Note 🔍 SWE-rebench code search localization targets
OpenHands/SWE-bench_Verified-locagent

Viewer • Updated Mar 17 • 500 • 110

Note 🎯 SWE-Bench Verified — localization ground truth
OpenHands/SWE-bench_Lite-locagent

Viewer • Updated Mar 17 • 300 • 14

Note 🎯 SWE-Bench Lite — localization ground truth
OpenHands/SWE-bench_Pro-locagent

Viewer • Updated Mar 17 • 264 • 10

Note 🎯 SWE-Bench Pro — localization ground truth

Upvote