Update README.md

f3ec159 verified 3 months ago

4.14 kB

	---
	license: apache-2.0
	base_model: sentence-transformers/LaBSE
	tags:
	- generated_from_trainer
	- news
	- russian
	- media
	- text-classification
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: news_classifier_ft
	results: []
	datasets:
	- data-silence/rus_news_classifier
	pipeline_tag: text-classification
	language:
	- ru
	widgets:
	- text: Введите новостной текст для классификации
	example_title: Классификация новостей
	button_text: Классифицировать
	api_name: classify
	library_name: transformers
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# any-news-classifier

	This model is a fine-tuned version of [sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE) on my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier).
	The learning news dataset is a well-balanced sample of recent news from the last five years.

	It achieves the following results on the evaluation set:
	- Loss: 0.3820
	- Accuracy: 0.9029
	- F1: 0.9025
	- Precision: 0.9030
	- Recall: 0.9029

	## Model description

	This is a multi-class classifier of Russian news, made with the LaBSE model finetune for [AntiSMI Project](https://github.com/data-silence/antiSMI-Project).
	The news category is assigned by the classifier to one of 11 categories:
	- climate (климат)
	- conflicts (конфликты)
	- culture (культура)
	- economy (экономика)
	- gloss (глянец)
	- health (здоровье)
	- politics (политика)
	- science (наука)
	- society (общество)
	- sports (спорт)
	- travel (путешествия)

	## Testing this model on `Spaces`

	You can try the model and evaluate its quality [here](https://huggingface.co/spaces/data-silence/rus-news-classifier)


	## How to use

	```python

	from transformers import pipeline

	category_mapper = {
	'LABEL_0': 'climate',
	'LABEL_1': 'conflicts',
	'LABEL_2': 'culture',
	'LABEL_3': 'economy',
	'LABEL_4': 'gloss',
	'LABEL_5': 'health',
	'LABEL_6': 'politics',
	'LABEL_7': 'science',
	'LABEL_8': 'society',
	'LABEL_9': 'sports',
	'LABEL_10': 'travel'
	}

	# Используйте предобученную модель из Hugging Face Hub
	classifier = pipeline("text-classification", model="data-silence/rus-news-classifier")

	def predict_category(text):
	result = classifier(text)
	category = category_mapper[result[0]['label']]
	score = result[0]['score']
	return category, score

	predict_category("В Париже завершилась церемония закрытия Олимпийских игр")
	# ('sports', 0.9959506988525391)
	```


	## Intended uses & limitations

	The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the classification of news categories politics, society and conflicts.

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 \| Precision \| Recall \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|:------:\|:---------:\|:------:\|
	\| 0.3544 \| 1.0 \| 3596 \| 0.3517 \| 0.8861 \| 0.8860 \| 0.8915 \| 0.8861 \|
	\| 0.2738 \| 2.0 \| 7192 \| 0.3190 \| 0.8995 \| 0.8987 \| 0.9025 \| 0.8995 \|
	\| 0.19 \| 3.0 \| 10788 \| 0.3524 \| 0.9016 \| 0.9015 \| 0.9019 \| 0.9016 \|
	\| 0.1402 \| 4.0 \| 14384 \| 0.3820 \| 0.9029 \| 0.9025 \| 0.9030 \| 0.9029 \|
	\| 0.1055 \| 5.0 \| 17980 \| 0.4399 \| 0.9022 \| 0.9018 \| 0.9024 \| 0.9022 \|


	### Framework versions

	- Transformers 4.42.4
	- Pytorch 2.3.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1