|
--- |
|
license: apache-2.0 |
|
language: |
|
- tr |
|
pipeline_tag: text-classification |
|
tags: |
|
- job advertisement |
|
- turkish bert |
|
- bert-based |
|
- StratifiedKFold |
|
--- |
|
|
|
--- |
|
language: |
|
- tr |
|
tags: |
|
- translation |
|
license: apache-2.0 |
|
--- |
|
|
|
## About the model |
|
It has been trained with 15451 real job advertisement data. |
|
|
|
Included classes; |
|
|
|
- Uygun İlan |
|
- Is Ilani Degil |
|
- Mustehcen |
|
- Cift Pozisyon |
|
|
|
|
|
Accordingly, the success rates in education are as follows; |
|
|
|
- **Model is Turkish bert-based.** |
|
- **Used StratifiedKFold(5) for validation.** |
|
- results [0.806858621805241, 0.8912621359223301, 0.9440129449838188, 0.9750809061488673, 0.9851132686084142] |
|
|
|
Mean-Precision: 0.9204655754937342 |
|
|
|
|
|
| | Uygun İlan | Is Ilani Degil | Mustehcen | Cift Pozisyon | |
|
| ------ | ------ | ------ | ------ | ------ | |
|
| Precision | 0.986 | 0.996 | 0.966 | 0.970 | |
|
| Recall | 0.992 | 0.986 | 0.966 | 0.959 | |
|
| F1 Score | 0.989 | 0.991 | 0.966 | 0.965 | |
|
Accuracy : 0.975 |
|
|
|
## Example |
|
|
|
**!IMPORTANT_HINT: The sentence given to pipe must not contain Turkish characters.** |
|
|
|
```sh |
|
from transformers import AutoTokenizer, TextClassificationPipeline, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("nanelimon/bert-base-turkish-job-advertisement") |
|
model = AutoModelForSequenceClassification.from_pretrained("nanelimon/bert-base-turkish-job-advertisement") |
|
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer) |
|
|
|
|
|
def set_sentence(sentence: str): |
|
result = sentence.lower().replace('ö', 'o').replace('ı', 'i').replace('ü', 'u').replace('ç', 'c').replace('ğ', 'g').replace('ş', 's') |
|
return result |
|
|
|
|
|
print(pipe(set_sentence('Fiziği düzgün 17 yaş kızlar aranıyor'))) |
|
|
|
``` |
|
Result; |
|
```sh |
|
output: [{'label': 'Mustehcen', 'score': 0.9992677569389343}] |
|
``` |
|
- label= It shows which class the sent Turkish text belongs to according to the model. |
|
- score= It shows the compliance rate of the Turkish text sent to the label found. |
|
|
|
## Authors |
|
- Seyma SARIGIL: [email protected] |
|
- Murat KOKLU: [email protected] |
|
|
|
- [Click](https://drive.google.com/file/d/1uFj7DrFhXv-_X6QYUXBdDa0o76M9p2cE/view?usp=sharing) to review Master's thesis |
|
|
|
## License |
|
|
|
apache-2.0 |
|
|
|
**Free Software, Hell Yeah!** |