data-silence
/

fasttext-rus-news-classifier

Text Classification

Model card Files Files and versions Community

fasttext-rus-news-classifier / README.md

data-silence's picture

Update README.md

a9dfadf verified 11 months ago

|

history blame contribute delete

2.95 kB

	---
	language:
	- ru
	library_name: fasttext
	pipeline_tag: text-classification
	tags:
	- news
	- media
	- russian
	- multilingual
	---

	# FastText Text Classifier

	This is a FastText model for text classification, trained on
	my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5
	years, hosted on Hugging Face Hub.
	The learning news dataset is a well-balanced sample of recent news from the last five years.

	## Model Description

	This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an
	accuracy of 0.8691 on a test dataset.

	## Task

	The model is designed to classify russian languages news articles into 11 categories.

	## Categories

	The news category is assigned by the classifier to one of 11 categories:

	- climate (климат)
	- conflicts (конфликты)
	- culture (культура)
	- economy (экономика)
	- gloss (глянец)
	- health (здоровье)
	- politics (политика)
	- science (наука)
	- society (общество)
	- sports (спорт)
	- travel (путешествия)
	}

	## Intended uses & limitations

	The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the
	classification of news categories politics, society and conflicts.

	## Usage

	To use this model, you will need the `fasttext` and `transformers` libraries. Install them using pip:

	`pip install fasttext transformers`

	Example of how to use the model:

	```python
	from huggingface_hub import hf_hub_download
	import fasttext


	class FastTextClassifierPipeline:
	def __init__(self, model_path):
	self.model = fasttext.load_model(model_path)

	def __call__(self, texts):
	if isinstance(texts, str):
	texts = [texts]

	results = []
	for text in texts:
	prediction = self.model.predict(text)
	label = prediction[0][0].replace("__label__", "")
	score = float(prediction[1][0])
	results.append({"label": label, "score": score})

	return results


	def pipeline(task="text-classification", model=None):
	# Загрузка файла model.bin
	repo_id = "data-silence/fasttext-rus-news-classifier"
	model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
	return FastTextClassifierPipeline(model_file)


	# Создание классификатора
	classifier = pipeline("text-classification")

	# Использование классификатора
	text = "В Париже завершилась церемония закрытия Олимпийских игр"
	result = classifier(text)
	print(result)
	# [{'label': 'sports', 'score': 1.0000100135803223}]
	```

	## Contacts

	If you have any questions or suggestions for improving the model, please create an issue in this repository or contact
	me at [email protected].