--- language: - ru library_name: fasttext pipeline_tag: text-classification tags: - news - media - russian - multilingual --- # FastText Text Classifier This is a FastText model for text classification, trained on my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5 years, hosted on Hugging Face Hub. The learning news dataset is a well-balanced sample of recent news from the last five years. ## Model Description This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an accuracy of 0.8691016964865116 on a test dataset. ## Task The model is designed to classify any languages news articles into 11 categories, but was originally trained to categorize Russian-language news. ## Categories The news category is assigned by the classifier to one of 11 categories: - climate (климат) - conflicts (конфликты) - culture (культура) - economy (экономика) - gloss (глянец) - health (здоровье) - politics (политика) - science (наука) - society (общество) - sports (спорт) - travel (путешествия) } ## Intended uses & limitations The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the classification of news categories politics, society and conflicts. ## Usage To use this model, you will need the `fasttext` and `transformers` libraries. Install them using pip: `pip install fasttext transformers` Example of how to use the model: ```python from huggingface_hub import hf_hub_download import fasttext class FastTextClassifierPipeline: def __init__(self, model_path): self.model = fasttext.load_model(model_path) def __call__(self, texts): if isinstance(texts, str): texts = [texts] results = [] for text in texts: prediction = self.model.predict(text) label = prediction[0][0].replace("__label__", "") score = float(prediction[1][0]) results.append({"label": label, "score": score}) return results def pipeline(task="text-classification", model=None): # Загрузка файла model.bin repo_id = "data-silence/fasttext-rus-news-classifier" model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin") return FastTextClassifierPipeline(model_file) # Создание классификатора classifier = pipeline("text-classification") # Использование классификатора text = "В Париже завершилась церемония закрытия Олимпийских игр" result = classifier(text) print(result) # [{'label': 'sports', 'score': 1.0000100135803223}] ``` ## Contacts If you have any questions or suggestions for improving the model, please create an issue in this repository or contact me at enjoy@data-silence.com.