Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
denizgulal 's Collections
safety & alignment

safety & alignment

updated Sep 26
Upvote
-

  • JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

    Paper • 2404.01318 • Published Mar 28, 2024

  • MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

    Paper • 2406.01574 • Published Jun 3, 2024 • 51

  • Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

    Paper • 2508.07750 • Published Aug 11 • 19
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs