SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper โข 2502.12464 โข Published 20 days ago โข 27
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper โข 2410.13334 โข Published Oct 17, 2024 โข 13
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Paper โข 2410.01524 โข Published Oct 2, 2024 โข 3