Text classification model for claim/premise detection in essay feedback

gbert-base-claim_premise is a text classification model in the non-scientific domain in German, finetuned from the model gbert-base. It was trained using a annotated dataset containing claim and premise sentences from essay feedback. The dataset was created by T. Wambsganss, C. Niklaus, M. Söllner, S. Handschuh and J. M. Leimeister and is available here.

Training

Training was conducted on a 10 epoch fine-tuning approach, however this repository contains the results of the second epoch, since it has the best accuracy:

Epoch Eval Loss Accuracy
1.0 0.4946 0.7621
2.0 0.5074 0.7709
3.0 0.8148 0.7627
4.0 1.1393 0.7560
5.0 1.3645 0.7551
6.0 1.5397 0.7560
7.0 1.8195 0.7548
8.0 2.0723 0.7536
9.0 2.0844 0.7566
10.0 2.1382 0.7563

In relation to the dataset, the model demonstrates that it can effectively learn to distinguish between the two classes claim and premise. However, the rapid onset of overfitting after epoch 2 suggests that the dataset is imbalanced and noisy. Further work should enable the model to be trained on more robust data to ensure better evaluation results.

Text Classification Tags

Text Classification Tag Text Classification Label
0 CLAIM
1 PREMISE
Downloads last month
9
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samirmsallem/gbert-base-claim_premise

Base model

deepset/gbert-base
Finetuned
(57)
this model

Dataset used to train samirmsallem/gbert-base-claim_premise

Collection including samirmsallem/gbert-base-claim_premise

Evaluation results

  • Accuracy on samirmsallem/argumentative_student_peer_reviews
    self-reported
    0.771