GitHub issues classifier (using zero shot classification)

Predicts wether a statement is a feature request, issue/bug or question

This model was trained using the Zero-shot classifier distillation method with the BART-large-mnli model as teacher model, to train a classifier on Github issues from the Github Issues Prediction dataset

Labels

As per the dataset Kaggle competition, the classifier predicts wether an issue is a bug, feature or question. After playing around with different labels pre-training I've used a different mapping of labels that yielded better predictions (see notebook here for details), labels being

  • issue
  • feature request
  • question

Training data

  • 15k of Github issues titles ("unlabeled_titles_simple.txt")
  • Hypothesis used: "This request is a {}"
  • Teacher model used: valhalla/distilbart-mnli-12-1
  • Studend model used: distilbert-base-uncased

Results

Agreement of student and teacher predictions: 94.82%

See this notebook for more info on feature engineering choice made

How to train using your own dataset

Acknowledgements

Downloads last month
6,054
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.