arxiv:2606.09697
Federico Torrielli
EvilScript
AI & ML interests
AI Safety & Mechanistic interpretability
Recent Activity
authored a paper about 11 hours ago
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models