thu-coai/ShieldLM-6B-chatglm3

Introduction

The ShieldLM model (paper link) initialized from chatglm3-6b. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help to detect safety issues in LLMs' generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions. Refer to our github repository for more detailed information.

Usage

Please refer to our github repository for the detailed usage instructions.

Performance

ShieldLM demonstrates impressive detection performance across 4 ID and OOD test sets, compared to strong baselines such as GPT-4, Llama Guard and Perspective API. Refer to our paper for more detailed evaluation results.