Introducing the github-top-code dataset: A curated dataset of 1.3M+ source code files from GitHub's top ranked developers.
I collected the best source code files from Github's highest trending developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.
AgentCPM-Explore🔥 on device agent foundation model released by OpenBMB openbmb/AgentCPM-Explore ✨ 4B - Apache2.0 ✨ Supports 100+ multi-turn environment interactions with search + verification ✨ Full training/inference stack is openly shared as well