HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction - a TrustSafeAI Collection

TrustSafeAI 's Collections

Attention Tracker Demo for Prompt Injection Detection

HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction

LLM LabSafety Benchmark

HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction

updated Dec 2, 2024

LLM-Tuning-Safety/HEx-PHI

Preview • Updated Aug 19, 2024 • 120 • 48