corall88 commited on
Commit
8cb1362
·
verified ·
1 Parent(s): dbcb4ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -47
README.md CHANGED
@@ -15,52 +15,61 @@ pipeline_tag: text-classification
15
  tags:
16
  - spam
17
  - detection
 
 
18
  library_name: transformers
19
  ---
20
- # Model Card for Model ID
21
-
22
- <!-- Provide a quick summary of what the model is/does. -->
23
-
24
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
25
-
26
- ## Model Details
27
-
28
- ### Model Description
29
-
30
- <!-- Provide a longer summary of what this model is. -->
31
-
32
-
33
-
34
- - **Developed by:** corall88
35
- - **Shared by:** corall88
36
- - **Model type:** Text classidication
37
- - **Language(s) (NLP):** russian, ru
38
- - **License:** cc by-nc-nd v.4
39
-
40
- ## Usage
41
-
42
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Recommendations
47
-
48
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
49
-
50
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
51
-
52
- ## Training Details
53
-
54
- [More Information Needed]
55
-
56
- #### Training Hyperparameters
57
-
58
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
59
-
60
-
61
- ## Evaluation
62
-
63
- <!-- This section describes the evaluation protocols and provides the results. -->
64
-
65
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
66
-
 
 
 
 
 
 
 
 
15
  tags:
16
  - spam
17
  - detection
18
+ - classification
19
+ - russian
20
  library_name: transformers
21
  ---
22
+ # russian_spam_detector
23
+
24
+ Модель **russian_spam_detector** предназначена для бинарной классификации текстов на 2 категории:
25
+ - **LABEL_0** — спам-сообщение
26
+ - **LABEL_1** нормальное сообщение (не спам)
27
+
28
+ ## 🚀 Использование
29
+
30
+ ```python
31
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
32
+
33
+ model_name = "corall88/russian_spam_detector"
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
36
+
37
+ detector = pipeline("text-classification", model=model, tokenizer=tokenizer)
38
+
39
+ message = "Поздравляем! Вы выиграли 1000000 рублей, пройдите по ссылке - ..."
40
+ predict = detector(message)
41
+ print(predict)
42
+ ```
43
+
44
+ ## 📊 Датасет
45
+ В качетсвете данных для файнтюнинга модели был выбран датасет[https://huggingface.co/datasets/alt-gnome/telegram-spam] cо спам сообщениями.
46
+
47
+ ## 🧠 Архитектура
48
+ Модель основана на **[RuModernBERT-base](https://huggingface.co/ModernBERT-base)** и дообучена на задаче бинарной классификации.
49
+
50
+ ## ⚙️ Параметры обучения
51
+ - **Epochs**: 4
52
+ - **Batch size**: 16
53
+ - **Optimizer**: AdamW
54
+ - **Learning rate**: 2e-5
55
+ - **Loss**: CrossEntropyLoss
56
+ - **Max sequence length**: 256
57
+
58
+ ## 📈 Результаты
59
+ | Metric | Value |
60
+ |-----------|-------|
61
+ | Accuracy | 0.99 |
62
+ | F1-score | 0.99 |
63
+ | Precision | 0.99 |
64
+ | Recall | 0.99 |
65
+
66
+
67
+ ```
68
+ @misc{russian_spam_detector,
69
+ title={russian_spam_detector: modern model for spam detection},
70
+ author={corall88},
71
+ url={https://huggingface.co/corall88/russian_spam_detector},
72
+ publisher={Hugging Face}
73
+ year={2025},
74
+ }
75
+ ```