DictaBERT
Collection
Collection of state-of-the-art language model for Hebrew, finetuned for various tasks, as detailed in the article: https://arxiv.org/abs/2308.16687
•
17 items
•
Updated
State-of-the-art language model for Hebrew, released here.
This is the fine-tuned model for the morphological tagging task.
For the bert-base models for other tasks, see here.
Sample usage:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictabert-morph')
model = AutoModel.from_pretrained('dicta-il/dictabert-morph', trust_remote_code=True)
model.eval()
sentence = 'בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים'
print(model.predict([sentence], tokenizer))
Output:
[{
"text": "בשנת 1948 השלים אפרים קישון את לימודיו בפיסול מתכת ובתולדות האמנות והחל לפרסם מאמרים הומוריסטיים",
"tokens": [{
"token": "בשנת",
"pos": "NOUN",
"feats": {
"Gender": "Fem",
"Number": "Sing"
},
"prefixes": ["ADP"],
"suffix": false
}, {
"token": "1948",
"pos": "NUM",
"feats": {},
"prefixes": [],
"suffix": false
}, {
"token": "השלים",
"pos": "VERB",
"feats": {
"Gender": "Masc",
"Number": "Sing",
"Person": "3",
"Tense": "Past"
},
"prefixes": [],
"suffix": false
}, {
"token": "אפרים",
"pos": "PROPN",
"feats": {},
"prefixes": [],
"suffix": false
}, {
"token": "קישון",
"pos": "PROPN",
"feats": {},
"prefixes": [],
"suffix": false
}, {
"token": "את",
"pos": "ADP",
"feats": {},
"prefixes": [],
"suffix": false
}, {
"token": "לימודיו",
"pos": "NOUN",
"feats": {
"Gender": "Masc",
"Number": "Plur"
},
"prefixes": [],
"suffix": "PRON",
"suffix_feats": {
"Gender": "Masc",
"Number": "Sing",
"Person": "3"
}
}, {
"token": "בפיסול",
"pos": "NOUN",
"feats": {
"Gender": "Masc",
"Number": "Sing"
},
"prefixes": ["ADP"],
"suffix": false
}, {
"token": "מתכת",
"pos": "NOUN",
"feats": {
"Gender": "Fem",
"Number": "Sing"
},
"prefixes": [],
"suffix": false
}, {
"token": "ובתולדות",
"pos": "NOUN",
"feats": {
"Gender": "Fem",
"Number": "Plur"
},
"prefixes": ["CCONJ", "ADP"],
"suffix": false
}, {
"token": "האמנות",
"pos": "NOUN",
"feats": {
"Gender": "Fem",
"Number": "Sing"
},
"prefixes": ["DET"],
"suffix": false
}, {
"token": "והחל",
"pos": "VERB",
"feats": {
"Gender": "Masc",
"Number": "Sing",
"Person": "3",
"Tense": "Past"
},
"prefixes": ["CCONJ"],
"suffix": false
}, {
"token": "לפרסם",
"pos": "VERB",
"feats": {},
"prefixes": [],
"suffix": false
}, {
"token": "מאמרים",
"pos": "NOUN",
"feats": {
"Gender": "Masc",
"Number": "Plur"
},
"prefixes": [],
"suffix": false
}, {
"token": "הומוריסטיים",
"pos": "ADJ",
"feats": {
"Gender": "Masc",
"Number": "Plur"
},
"prefixes": [],
"suffix": false
}]
}]
If you use DictaBERT in your research, please cite DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
BibTeX:
@misc{shmidman2023dictabert,
title={DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew},
author={Shaltiel Shmidman and Avi Shmidman and Moshe Koppel},
year={2023},
eprint={2308.16687},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
This work is licensed under a Creative Commons Attribution 4.0 International License.