data-silence commited on
Commit
a2e268d
1 Parent(s): 264a654

evaluation works

Browse files
Files changed (3) hide show
  1. README.md +51 -18
  2. inference.py +0 -24
  3. requirements.txt +2 -1
README.md CHANGED
@@ -1,33 +1,36 @@
1
  ---
2
  language:
3
- - ru
4
  library_name: fasttext
5
  pipeline_tag: text-classification
6
  tags:
7
- - news
8
- - media
9
- - russian
10
- - multilingual
11
  ---
12
 
13
  # FastText Text Classifier
14
 
15
- This is a FastText model for text classification, trained on my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5 years, hosted on Hugging Face Hub.
 
 
16
  The learning news dataset is a well-balanced sample of recent news from the last five years.
17
 
18
  ## Model Description
19
 
20
- This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an accuracy of 0.8691016964865116 on a test dataset.
 
21
 
22
  ## Task
23
 
24
- The model is designed to classify any languages news articles into 11 categories, but was originally trained to categorize Russian-language news.
25
-
26
 
27
  ## Categories
28
 
29
-
30
  The news category is assigned by the classifier to one of 11 categories:
 
31
  - climate (климат)
32
  - conflicts (конфликты)
33
  - culture (культура)
@@ -39,13 +42,12 @@ The news category is assigned by the classifier to one of 11 categories:
39
  - society (общество)
40
  - sports (спорт)
41
  - travel (путешествия)
42
- }
43
-
44
 
45
  ## Intended uses & limitations
46
 
47
- The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the classification of news categories politics, society and conflicts.
48
-
49
 
50
  ## Usage
51
 
@@ -56,15 +58,46 @@ To use this model, you will need the `fasttext` and `transformers` libraries. In
56
  Example of how to use the model:
57
 
58
  ```python
59
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
- classifier = pipeline("text-classification", model="data-silence/fasttext-rus-news-classifier")
 
62
 
63
- text = "Your text to classify here"
 
64
  result = classifier(text)
65
  print(result)
 
66
  ```
67
 
68
  ## Contacts
69
 
70
- If you have any questions or suggestions for improving the model, please create an issue in this repository or contact me at [email protected].
 
 
1
  ---
2
  language:
3
+ - ru
4
  library_name: fasttext
5
  pipeline_tag: text-classification
6
  tags:
7
+ - news
8
+ - media
9
+ - russian
10
+ - multilingual
11
  ---
12
 
13
  # FastText Text Classifier
14
 
15
+ This is a FastText model for text classification, trained on
16
+ my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5
17
+ years, hosted on Hugging Face Hub.
18
  The learning news dataset is a well-balanced sample of recent news from the last five years.
19
 
20
  ## Model Description
21
 
22
+ This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an
23
+ accuracy of 0.8691016964865116 on a test dataset.
24
 
25
  ## Task
26
 
27
+ The model is designed to classify any languages news articles into 11 categories, but was originally trained to
28
+ categorize Russian-language news.
29
 
30
  ## Categories
31
 
 
32
  The news category is assigned by the classifier to one of 11 categories:
33
+
34
  - climate (климат)
35
  - conflicts (конфликты)
36
  - culture (культура)
 
42
  - society (общество)
43
  - sports (спорт)
44
  - travel (путешествия)
45
+ }
 
46
 
47
  ## Intended uses & limitations
48
 
49
+ The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the
50
+ classification of news categories politics, society and conflicts.
51
 
52
  ## Usage
53
 
 
58
  Example of how to use the model:
59
 
60
  ```python
61
+ from huggingface_hub import hf_hub_download
62
+ import fasttext
63
+
64
+
65
+ class FastTextClassifierPipeline:
66
+ def __init__(self, model_path):
67
+ self.model = fasttext.load_model(model_path)
68
+
69
+ def __call__(self, texts):
70
+ if isinstance(texts, str):
71
+ texts = [texts]
72
+
73
+ results = []
74
+ for text in texts:
75
+ prediction = self.model.predict(text)
76
+ label = prediction[0][0].replace("__label__", "")
77
+ score = float(prediction[1][0])
78
+ results.append({"label": label, "score": score})
79
+
80
+ return results
81
+
82
+
83
+ def pipeline(task="text-classification", model=None):
84
+ # Загрузка файла model.bin
85
+ repo_id = "data-silence/fasttext-rus-news-classifier"
86
+ model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
87
+ return FastTextClassifierPipeline(model_file)
88
+
89
 
90
+ # Создание классификатора
91
+ classifier = pipeline("text-classification")
92
 
93
+ # Использование классификатора
94
+ text = "В Париже завершилась церемония закрытия Олимпийских игр"
95
  result = classifier(text)
96
  print(result)
97
+ # [{'label': 'sports', 'score': 1.0000100135803223}]
98
  ```
99
 
100
  ## Contacts
101
 
102
+ If you have any questions or suggestions for improving the model, please create an issue in this repository or contact
103
inference.py DELETED
@@ -1,24 +0,0 @@
1
- import fasttext
2
- from transformers import pipeline
3
-
4
-
5
- class FastTextClassifierPipeline(pipeline):
6
- def __init__(self, model_path):
7
- self.model = fasttext.load_model(model_path)
8
-
9
- def __call__(self, texts):
10
- if isinstance(texts, str):
11
- texts = [texts]
12
-
13
- results = []
14
- for text in texts:
15
- prediction = self.model.predict(text)
16
- label = prediction[0][0].replace("__label__", "")
17
- score = prediction[1][0]
18
- results.append({"label": label, "score": score})
19
-
20
- return results
21
-
22
-
23
- def pipeline(task="text-classification", model=None):
24
- return FastTextClassifierPipeline("model.bin")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  fasttext
2
- transformers
 
 
1
  fasttext
2
+ transformers
3
+ huggingface_hub