CodeScout
RL-trained code search agents (1.7B, 4B, 14B) that outperform 2–18× larger models using only a Unix terminal. 📄 arxiv.org/abs/XXXX.XXXXX
Text Generation • 15B • UpdatedNote 🏆 CodeScout-14B — strongest model, SOTA on SWE-Bench Verified/Pro/Lite
OpenHands/CodeScout-4B
Text Generation • 4B • UpdatedNote ⚡ CodeScout-4B — outperforms 8× larger Qwen3-32B across all benchmarks
OpenHands/CodeScout-1.7B
Text Generation • 2B • UpdatedNote 🔬 CodeScout-1.7B — post-RL checkpoint, outperforms 8× larger Qwen3-14B
OpenHands/CodeScout-1.7B-RFT
Text Generation • 2B • Updated • 1Note 📦 CodeScout-1.7B-RFT — pre-RL (rejection fine-tuned) checkpoint
OpenHands/CodeScout_Training_Rollouts
Viewer • Updated • 54.8kNote 🗂️ Training rollouts from SWE-Smith environments
OpenHands/CodeScout_Eval_Rollouts
Viewer • Updated • 12.7kNote 📊 Evaluation trajectories on SWE-Bench Verified, Pro, and Lite
OpenHands/SWE-smith-py-code-search
Viewer • Updated • 39.3kNote 🔍 SWE-Smith code search localization targets
OpenHands/SWE-Gym-code-search
Viewer • Updated • 2.32kNote 🔍 SWE-Gym code search localization targets
OpenHands/SWE-rebench-code-search
Viewer • Updated • 17.6kNote 🔍 SWE-rebench code search localization targets
OpenHands/SWE-bench_Verified-locagent
Viewer • Updated • 500Note 🎯 SWE-Bench Verified — localization ground truth
OpenHands/SWE-bench_Lite-locagent
Viewer • Updated • 300Note 🎯 SWE-Bench Lite — localization ground truth
OpenHands/SWE-bench_Pro-locagent
Viewer • Updated • 264Note 🎯 SWE-Bench Pro — localization ground truth