Indo Spam Chatbot

Model Overview

Indo Spam Chatbot is a fine-tuned spam detection model based on the Gemma 2 2B architecture. This model is specifically designed for identifying spam messages in WhatsApp chatbot interactions. It has been fine-tuned using a dataset of 40,000 spam messages collected over a year. The dataset includes two labels:

  • Spam
  • Non-spam

The model supports detecting spam across multiple categories, such as:

  • Offensive and abusive words
  • Profane language
  • Gibberish words and numbers
  • Spam links
  • And more

How To Use

Using this model becomes easy when you have transformers installed:

pip install -U transformers

Then you can use the model like this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Spam sentence
sentences = ["adsfwcasdfad", 
             "kak bisa depo di link ini: http://dewa.site/dewa/dewi", 
             "p", 
             "1234"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('kasyfilalbar/indo-spam-chatbot')
model = AutoModelForSequenceClassification.from_pretrained('kasyfilalbar/indo-spam-chatbot', device_map = "auto")

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    encoded_input = encoded_input.to('cuda')
    model_output = model(**encd_sent)
    model_output = model_output.logits
    label = torch.argmax(model_output, dim=1)

print(label.item())

REPOSITORY

for more info about the code, you could visit https://github.com/Kasyfil97/indo-spam-chatbot

Downloads last month
25
Safetensors
Model size
2.61B params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for kasyfilalbar/indo-spam-chatbot

Base model

google/gemma-2-2b
Finetuned
(470)
this model