Bert-based classifier (finetuned from rubert-tiny2)
Merged datasets:
The datasets split into train, val, test splits in 80-10-10 proportion The metrics obtained from test dataset is as follows:
precision | recall | f1-score | support | |
---|---|---|---|---|
0 | 0.9827 | 0.9827 | 0.9827 | 21216 |
1 | 0.9272 | 0.9274 | 0.9273 | 5054 |
accuracy | 0.9720 | 26270 | ||
macro avg | 0.9550 | 0.9550 | 0.9550 | 26270 |
weighted avg | 0.9720 | 0.9720 | 0.9720 | 26270 |
Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
PATH = 'khvatov/ru_toxicity_detector'
tokenizer = AutoTokenizer.from_pretrained(PATH)
model = AutoModelForSequenceClassification.from_pretrained(PATH)
# if torch.cuda.is_available():
# model.cuda()
model.to(torch.device("cpu"))
def get_toxicity_probs(text):
with torch.no_grad():
inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True).to(model.device)
proba = torch.nn.functional.softmax(model(**inputs).logits, dim=1).cpu().numpy()
return proba[0]
TEXT = "Марк был хороший"
print(f'text = {TEXT}, probs={get_toxicity_probs(TEXT)}')
# text = Марк был хороший, probs=[0.9940585 0.00594147]
Train
The model has been trained with Adam optimizer, the learning rate of 2e-5, and batch size of 32 for 3 epochs
- Downloads last month
- 1,254
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.