RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
Abstract
RedBench presents a unified dataset with standardized risk categorization for evaluating LLM vulnerabilities across multiple domains and attack types.
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval
Community
RedBench presents a unified dataset with standardized risk categorization for evaluating LLM vulnerabilities across multiple domains and attack types.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System (2025)
- TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations (2025)
- CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance (2025)
- Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models (2025)
- AprielGuard (2025)
- OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs (2026)
- Red Teaming Large Reasoning Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper