A newer version of this model is available: nur-dev/roberta-kaz-large

Model Card for Model ID

This model is designed for text classification tasks in the Kazakh language, based on the RoBERTa architecture and fine-tuned using the Small Kazakh Corpus dataset.

Model Details

Model Description

The model aims to enhance natural language processing (NLP) capabilities for the Kazakh language, particularly in text classification tasks.

  • Developed by: Tleubayeva Arailym, Tabuldin Aisultan, Aubakirov Sultan
  • Model type: Transformer-based (RoBERTa)
  • Language(s) (NLP): Kazakh (kk)
  • License: apache-2.0

Results

Evaluation results show an improvement in both accuracy and F1-score:

Base model performance:

Accuracy: 50.30%

F1-score: 48.89%

Fine-tuned model performance:

Accuracy: 55.51% (+10%)

F1-score: 54.83% (+5%)

Citation

We will definitely add a bit later.

Model Card Authors

Tleubayeva Arailym, PhD student of Astana IT University

Tabuldin Aisultan, 3rd year student of Astana IT University

Aubakirov Sultan, 3rd year student of Astana IT University

Downloads last month
11
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arailym-tleubayeva/roberta-kaz-large-small-kazakh-corpus

Finetuned
(2)
this model

Dataset used to train Arailym-tleubayeva/roberta-kaz-large-small-kazakh-corpus