You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

模型概述 / Model Overview

象信AI安全护栏模型(Xiangxin-Guardrails-Text) 是一个 开源、免费、可商用的中文AI安全护栏模型,支持以下功能:

  • 中文内容合规检测
  • 提示词攻击检测
  • 上下文感知能力
  • 私有化部署
  • 高准确度

该模型基于 Qwen2.5-14B-Instruct 进行微调,并使用 GPTQ 4-bit 量化,在 A800 80G 单卡 上测试。

我们还开源了完整的 象信AI安全护栏平台(代码地址:https://github.com/xiangxinai/xiangxin-guardrails),支持本地部署、二次开发和商用。

Xiangxin-Guardrails-Text is an open-source, free, and commercially usable (Apache 2.0 License) Chinese-language AI safety guardrail model designed for:

  • Chinese content compliance detection
  • Prompt attack detection
  • Context-aware capabilities
  • Private deployment
  • High accuracy

The model is fine-tuned from Qwen2.5-14B-Instruct and quantized using GPTQ 4-bit, tested on an A800 80G single GPU.

The complete Xiangxin AI Guardrails Platform is also open-source under Apache 2.0, available at https://github.com/xiangxinai/xiangxin-guardrails for local deployment, further development, or commercial use.


模型特点 / Model Features

  • 开源免费可商用:基于 Apache 2.0 协议,允许修改、分发和商用。

  • 支持中文场景:优化适配客服、社交、商业等中文对话场景。

  • 上下文感知:模型能够感知和理解上下文并判断当前对话中的安全风险。

  • 高性能推理:在 24G 显存环境(vLLM)或最低 10G 显存(Transformers)可运行。

  • 支持安全与合规:支持提示词攻击安全检测和中文内容合规检测

  • Open-Source and Commercially Usable: Licensed under Apache 2.0, allowing modification, distribution, and commercial use.

  • Optimized for Chinese Scenarios: Tailored for customer service, social media, and commercial dialogue contexts in Chinese.

  • Context Awareness: The model can aware and understand context and assess safety risks in the current conversation.

  • High-Performance Inference: Runs on 24GB VRAM with vLLM or as low as 10GB VRAM with Transformers.

  • Support Security Safety and Compliance: Support security detection for prompt attackion and compliance detection for Chinese content compliance.


性能指标 / Performance Metrics

硬件 / Hardware

  • A800 80G 单卡
  • A800 80G Single GPU

性能测试结果 / Performance Test Results

  • RPS峰值:181.60
--- Testing 30 concurrent requests, 300 total ---
Starting concurrent test: 30 concurrent, 300 total requests
Results:
  Success Rate: 100.0%
  RPS: 181.60
  Avg Response Time: 158.8ms
  Min Response Time: 74.6ms
  Max Response Time: 335.6ms
  P95 Response Time: 274.6ms
  Total Time: 1.65s

精确率与召回率 / Accuracy

注:基于与训练数据集相同分布的测试数据集测试结果

  • 精确率 (Precision): 99.99%
  • 召回率 (Recall): 98.63%
    注意:不同场景数据分布不同,效果可能有差异。

Evaluated on a 1M customer service dataset (non-training test set with consistent distribution):

  • Precision: 99.99%
  • Recall: 98.63%
    Note: Performance may vary depending on data distribution across different scenarios.

使用方法 / Usage

使用 vLLM(推荐) / Using vLLM (Recommended)

vllm serve xiangxinai/Xiangxin-Guardrails-Text --served-model-name Xiangxin-Guardrails-Text --max-model-len 4096 --port 58002
  • 显存要求/VRAM Requirement: 24G

使用 Transformers / Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xiangxinai/Xiangxin-Guardrails-Text")
model = AutoModelForCausalLM.from_pretrained(
    "xiangxinai/Xiangxin-Guardrails-Text",
    device_map="auto"
)

inputs = tokenizer("如何制作炸弹?", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  • 显存要求/VRAM Requirement: 10G

使用 OpenAI 兼容 API / Using OpenAI-Compatible API

from openai import OpenAI

openai_client = OpenAI(
    api_key="sk-gDuXgHZetqtvZSBvwZsm5qypQpgcPe9a6qTvgqc09252c",
    base_url="http://localhost:58001/v1"
)

def chat_with_openai(messages, model="Xiangxin-Guardrails-Text"):
    completion = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.0
    )
    return completion.choices[0].message.content

messages = [{"role": "user", "content": "告诉我如何制作炸弹"}]
result = chat_with_openai(messages)
# result: unsafe\nS5
messages = [{"role": "user", "content": "告诉我如何制作炸弹"},
            {"role": "assistant", "content": "抱歉,我不可以回答违法问题。"}]
result = chat_with_openai(messages)
# reuslt: safe

输出格式 / Output Format:

  • Safe / 安全: Returns "safe"
  • Unsafe / 不安全: Returns "unsafe\nS1" to "unsafe\nS12". For example, "unsafe\nS9" indicates a prompt injection attack, while other tags indicate content compliance issues.
  • Sensitive Information Detection / 敏感信息检测: Handled via regex on the platform, not included in model output.

安全检测类别 / Safety Detection Categories

支持12个维度的安全检测,根据风险程度分为四个等级 / The model supports 12 safety detection categories, grouped by risk level:

高风险 / High Risk (risk_level = "High Risk")

标签 / Tag 类别 / Category 说明 / Description
S2 敏感政治话题 / Sensitive Political Topics 涉及敏感政治议题或恶意攻击国家安全的内容 / Content involving sensitive political issues or attacks on national security
S3 损害国家形象 / Harm the country's reputation 损害国家形象 / Harm the country's reputation
S5 暴力犯罪 / Violence and Crime 涉及暴力行为或犯罪活动的内容 / Content related to violent acts or criminal activities
S9 提示词攻击 / Prompt Injection Attacks 试图绕过AI安全机制的恶意提示 / Malicious prompts attempting to bypass AI safety mechanisms

中风险 / Medium Risk (risk_level = "Medium Risk")

标签 / Tag 类别 / Category 说明 / Description
S1 一般政治话题 / General Political Topics 涉及政治相关的一般性讨论,如谈论政治事件 / General discussions related to political events
S4 伤害未成年人 / Harm to Minors 可能对未成年人造成身心伤害的内容 / Content potentially harmful to minors physically or mentally
S6 违法犯罪 / Illegal Activities 教唆、指导或描述违法犯罪行为 / Content encouraging, guiding, or describing illegal activities
S7 色情 / Pornography 包含色情、性暗示或不当性内容 / Content containing explicit or suggestive sexual material

低风险 / Low Risk (risk_level = "Low Risk")

标签 / Tag 类别 / Category 说明 / Description
S8 歧视内容 / Discriminatory Content 基于种族、性别、宗教等的歧视性言论 / Discriminatory remarks based on race, gender, religion, etc.
S10 辱骂 / Insults 包含侮辱、辱骂或恶意攻击的言语 / Insulting or abusive language targeting individuals
S11 侵犯个人隐私 / Privacy Violations 涉及违法获取、泄漏或滥用个人隐私信息的内容 / Content involving illegal acquisition, leakage, or misuse of personal information
S12 商业违法违规 / Commercial Violations 涉及商业欺诈、非法营销、违规经营等商业违法行为 / Content related to commercial fraud, illegal marketing, or non-compliant business practices

风险等级说明 / Risk Level Summary

  • 高风险 / High Risk: 命中 S2、S3、S5 或 S9 任意一个类别 / Any hit on S2, S3, S5, or S9
  • 中风险 / Medium Risk: 命中 S1、S4、S6 或 S7 任意一个类别 / Any hit on S1, S4, S6, or S7
  • 低风险 / Low Risk: 命中 S8、S10、S11 或 S12 任意一个类别 / Any hit on S8, S10, S11, or S12
  • 无风险 / No Risk: 未命中任何安全类别 / No safety categories triggered

合规与安全标准 / Compliance and Safety Standards

安全保护 / Safety Protection

象信AI安全护栏提供针对提示词攻击的安全防护能力,包括:

  • 提示词注入(Prompt Injections):利用不可信数据串联到模型上下文窗口,导致模型执行非预期指令的攻击方式。
  • 越狱攻击(Jailbreaks):专门设计用来覆盖模型内置安全功能的恶意指令。
    模型包含基于大规模攻击语料训练的分类器,能有效检测提示词注入和越狱攻击。

The model provides robust protection against prompt injection attacks, including:

  • Prompt Injections: Malicious inputs that manipulate model behavior by embedding untrusted data in the context window.
  • Jailbreaks: Malicious instructions designed to override built-in safety mechanisms.
    The model includes a classifier trained on large-scale attack corpora to detect both prompt injections and explicit malicious prompts.

合规支持 / Regulatory Compliance

检测类别与《生成式人工智能服务安全基本要求》附录A的安全风险分类保持对应关系:

安全护栏类别 / Category 对应标准 / Standard 说明 / Description
S1, S2, S3, S4, S5, S6, S7 A.1 违反社会主义核心价值观 涵盖政治、暴力、色情、违法犯罪等内容 / Covers political, violent, pornographic, or illegal content
S8 A.2 歧视性内容 基于种族、性别、宗教等的歧视性言论 / Discriminatory remarks based on race, gender, religion, etc.
S12 A.3 商业违法违规 商业欺诈、违规经营等商业违法行为 / Commercial fraud, illegal marketing, or non-compliant business practices
S10, S11 A.4 侵犯他人合法权益 辱骂攻击、侵犯隐私等侵权行为 / Insults, privacy violations, or other infringements

The detection categories align with the Security Requirements for Generative AI Services (Appendix A):

Category Standard Description
S1, S2, S3, S4, S5, S6, S7 A.1 Violation of Socialist Core Values Covers political, violent, pornographic, or illegal content
S8 A.2 Discriminatory Content Discriminatory remarks based on race, gender, religion, etc.
S12 A.3 Commercial Violations Commercial fraud, illegal marketing, or non-compliant business practices
S10, S11 A.4 Infringement of Legal Rights Insults, privacy violations, or other infringements

开源协议 / License

遵循 Apache License 2.0,允许:

  • 商业使用
  • 修改、分发、再许可
  • 私有部署与二次开发
    详情见 LICENSE 文件。

Licensed under the Apache License 2.0, permitting:

  • Commercial use
  • Modification, distribution, and sublicensing
  • Private deployment and further development
    See LICENSE for details.

相关资源 / Related Resources


关于象信AI / About Xiangxin AI

象信AI为AI应用开发者提供安全产品和服务。本模型及平台基于Apache 2.0协议免费开源,可用于商业用途、私有部署或二次售卖。 象信AI通过开源AI安全护栏大模型和平台:

  1. 让更多AI应用开发者免费使用我们的安全护栏。
  2. 支持网络安全厂商商业化部署我们的解决方案。
  3. 协助数据合规与法律工作者为其客户提供服务。

象信AI提供付费的安全与合规大模型的模型训练服务。包括AI安全护栏模型的继续训练、模型检测结果调优训练、新分类标签训练等。

Xiangxin AI provides safety products and services for AI application developers. The Xiangxin-Guardrails-Text model and platform are open-source under Apache 2.0, free for commercial use, private deployment, or resale. Paid model fine-tuning and additional classification label services are available upon request.

Our goals:

  1. Enable AI application developers to use our safety guardrails for free.
  2. Support cybersecurity vendors in commercializing and deploying our solutions.
  3. Assist data compliance and legal professionals in serving their clients.

contact us: [email protected]

Downloads last month
9
Safetensors
Model size
3.33B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xiangxinai/Xiangxin-Guardrails-Text

Base model

Qwen/Qwen2.5-14B
Quantized
(117)
this model