---
language:
  - ru
library_name: fasttext
pipeline_tag: text-classification
tags:
  - news
  - media
  - russian
  - multilingual
---

# FastText Text Classifier

This is a FastText model for text classification, trained on
my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5
years, hosted on Hugging Face Hub.
The learning news dataset is a well-balanced sample of recent news from the last five years.

## Model Description

This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an
accuracy of 0.8691 on a test dataset.

## Task

The model is designed to classify russian languages news articles into 11 categories.

## Categories

The news category is assigned by the classifier to one of 11 categories:

- climate (климат)
- conflicts (конфликты)
- culture (культура)
- economy (экономика)
- gloss (глянец)
- health (здоровье)
- politics (политика)
- science (наука)
- society (общество)
- sports (спорт)
- travel (путешествия)
  }

## Intended uses & limitations

The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the
classification of news categories politics, society and conflicts.

## Usage

To use this model, you will need the `fasttext` and `transformers` libraries. Install them using pip:

`pip install fasttext transformers`

Example of how to use the model:

```python
from huggingface_hub import hf_hub_download
import fasttext


class FastTextClassifierPipeline:
    def __init__(self, model_path):
        self.model = fasttext.load_model(model_path)

    def __call__(self, texts):
        if isinstance(texts, str):
            texts = [texts]

        results = []
        for text in texts:
            prediction = self.model.predict(text)
            label = prediction[0][0].replace("__label__", "")
            score = float(prediction[1][0])
            results.append({"label": label, "score": score})

        return results


def pipeline(task="text-classification", model=None):
    # Загрузка файла model.bin
    repo_id = "data-silence/fasttext-rus-news-classifier"
    model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
    return FastTextClassifierPipeline(model_file)


# Создание классификатора
classifier = pipeline("text-classification")

# Использование классификатора
text = "В Париже завершилась церемония закрытия Олимпийских игр"
result = classifier(text)
print(result)
# [{'label': 'sports', 'score': 1.0000100135803223}]
```

## Contacts

If you have any questions or suggestions for improving the model, please create an issue in this repository or contact
me at enjoy@data-silence.com.