Update README.md
Browse files
README.md
CHANGED
|
@@ -311,7 +311,7 @@ Evaluated using the CantTalkAboutThis Dataset as introduced in the CantTalkAbout
|
|
| 311 |
|
| 312 |
The Nemotron-4 340B-Instruct model underwent extensive safety evaluation including adversarial testing via three distinct methods:
|
| 313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
| 314 |
-
-
|
| 315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|
| 316 |
|
| 317 |
### Limitations
|
|
|
|
| 311 |
|
| 312 |
The Nemotron-4 340B-Instruct model underwent extensive safety evaluation including adversarial testing via three distinct methods:
|
| 313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
| 314 |
+
- AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
|
| 315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|
| 316 |
|
| 317 |
### Limitations
|