LionGuard 2

LionGuard 2 is a multilingual content moderation classifier tuned for English/Singlish, Chinese, Malay, and Tamil in the Singapore context.

It leverages OpenAI’s text-embedding-3-large with a multi-head classifier to return fine-grained scores for the following categories:

  • Overall safety (binary)
  • Hate (hateful_l1, hateful_l2)
  • Insults (insults)
  • Sexual content (sexual_l1, sexual_l2)
  • Physical violence (physical_violence)
  • Self-harm (self_harm_l1, self_harm_l2)
  • Other misconduct (all_other_misconduct_l1, all_other_misconduct_l2)

Taxonomy

S/N Category Level Description Example
1 Hate 1 (Discriminatory) Derogatory or generalized negative statements targeting a protected group. “All Chinese people are so scheming.”
“Malays are always late and lazy.”
“Indians are too noisy; I avoid them whenever possible.”
2 (Hate Speech) Explicit calls for harm or violence against a protected group. “We should kick all Chinese people out of the country.”
“Burn down their temple…”
“Kill the Indians in Little India; they don’t belong here.”
2 Insults no sub-levels Demeans, humiliates, or mocks without referencing a protected trait. “You’re so dumb, you probably failed your PSLE.”
“Your singing is worse than karaoke at a dodgy KTV.”
3 Sexual 1 (Not appropriate for minors) Mild-to-moderate sexual content. “Let’s talk about how condoms work.”
“We had a one-night stand last weekend.”
2 (Not appropriate for all ages) Explicit or graphic sexual content. “Always think they very yandao … but actually is just one ** bin.”
4 Physical Violence no sub-levels Glorification of violence or threats of physical harm. “If I see him at Clarke Quay, I’m going to whack him.”
“Someone should plant a bomb at that office…”
5 Self-Harm 1 (Ideation) Expressions of suicidal thoughts or encouragement of self-harm. “I’m so stressed … I feel like ending it all.”
“Failing my poly exams made me want to cut myself.”
2 (Action / Suicide) Descriptions of ongoing or imminent self-harm behavior. “I’ve locked myself in my room and taken a bunch of pills.”
“I’m on the rooftop at Toa Payoh, ready to jump.”
6 All Other Misconduct 1 (Generally not socially accepted) Unethical or immoral behavior not necessarily illegal. “Let’s spread fake rumours about her …”
“How to secretly record someone’s private conversation?”
2 (Illegal activities) Instructions or credible threats of serious harm; facilitation of crimes. “Anyone know where to buy illegal knives in Geylang?”
“Let’s hack that e-commerce site to get credit card details.”

Usage

import os
import numpy as np
from transformers import AutoModel
from openai import OpenAI

# Load model directly from HF
model = AutoModel.from_pretrained(
    "govtech/lionguard-2", 
    trust_remote_code=True
    )

# Get OpenAI embeddings (users to input their own OpenAI API key)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.embeddings.create(
    input="Hello, world!", # users to input their own text
    model="text-embedding-3-large",
    dimensions=1536 # dimensions of the embedding
    )
embeddings = np.array([data.embedding for data in response.data])

# Run LionGuard 2
results = model.predict(embeddings)
Downloads last month
133
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using govtech/lionguard-2 1

Collection including govtech/lionguard-2