🔥 BLAZE: Cross-Language and Cross-Project Bug Localization

BLAZE is a transformer-based bug localization model that works across languages and software projects. It enhances source-bug alignment using dynamic chunking and hard example learning, enabling precise bug localization in unseen codebases and programming languages.

✨ Highlights

📌 Cross-project & cross-language bug localization with no re-training
📏 Dynamic Chunking handles long files within LLM context windows
🧠 Hard Example Learning improves generalization and ranking accuracy
🌍 Supports Java, Python, C++, JavaScript, and Go
📊 Outperforms both cross-project and embedding-based baselines

📂 Dataset: BeetleBox

BeetleBox is the largest curated dataset for bug localization:

23,782 real-world bugs
29 repositories
5 programming languages
Cleaned and de-duplicated to remove overlaps with training data

📥 Available on Zenodo 📚 Also listed on Hugging Face Datasets: bug-localization/BeetleBox

🚀 Demo & Usage

All code, usage instructions, model files, and scripts are available via:

👉 BLAZE Repository & Demo (Zenodo)

📝 Citation

Please cite the following paper if you use BLAZE or BeetleBox in your work:

@article{Chakraborty2025,
  title = {BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning},
  ISSN = {2326-3881},
  url = {http://dx.doi.org/10.1109/TSE.2025.3579574},
  DOI = {10.1109/TSE.2025.3579574},
  journal = {IEEE Transactions on Software Engineering},
  publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
  author = {Chakraborty, Partha and Alfadel, Mahmoud and Nagappan, Meiyappan},
  year = {2025},
  pages = {1--14}
}

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for bug-localization/BLAZE

Base model

codesage/codesage-base

Finetuned

(1)

this model

bug-localization
/

BLAZE