yizhao-risk-zh-scorer

Introduction

This is a BERT model fine-tuned on a high-quality Chinese financial dataset. It generates a security risk score, which helps to identify and remove data with security risks from financial datasets, thereby reducing the proportion of illegal or undesirable data. For the complete data cleaning process, please refer to YiZhao.

Quickstart

Here is an example code snippet for generating security risk scores using this model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

text = "你是一个聪明的机器人"
risk_model_name = "risk-model-zh-v0.1"

risk_tokenizer = AutoTokenizer.from_pretrained(risk_model_name)
risk_model = AutoModelForSequenceClassification.from_pretrained(risk_model_name)

risk_inputs = risk_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
risk_outputs = risk_model(**risk_inputs)
risk_logits = risk_outputs.logits.squeeze(-1).float().detach().numpy()

risk_score = risk_logits.item()

result = {
    "text": text,
    "risk_score": risk_score
}

print(result)
# {'text': '你是一个聪明的机器人', 'risk_score': 0.11226219683885574}
Downloads last month
115
Safetensors
Model size
102M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Collection including HIT-TMG/yizhao-risk-zh-scorer