avichr
/

heBERT

Fill-Mask

Transformers

PyTorch

JAX

bert

Model card Files Files and versions Community

avichr commited on Nov 28, 2020

Commit

00db184

1 Parent(s): a29a90d

large, oscar, v0.1

Browse files

Files changed (1) hide show

README.md +6 -19

README.md CHANGED Viewed

@@ -1,19 +1,10 @@
 # THIS IS BETA REPO
 We will release a better one soon:)
-<br><br>
-## train details:
-**token**: large vocab(52K) <br>
-**lm dataset**: OSCAR<br>
-**sentiment dataset**: comments on news
 ## HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
 HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config [(Devlin et al. 2018)](https://arxiv.org/abs/1810.04805). <br>
-HeBert was trained on three dataset:
 1. A Hebrew version of OSCAR [(Ortiz, 2019)](https://oscar-corpus.com/): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
 2. A Hebrew dump of Wikipedia: ~650 MB of data, including over 63 millions words and 3.8 millions sentences
 3. Emotion UGC data that was collected for the purpose of this study. (described below)
@@ -24,12 +15,8 @@ Our User Genrated Content (UGC) is comments written on articles collected from 3
 4000 sentences annotated by crowd members (3-10 annotators per sentence) for 8 emotions (anger, disgust, expectation , fear, happy, sadness, surprise and trust) and overall sentiment / polarity<br>
 In order to valid the annotation, we search an agreement between raters to emotion in each sentence using krippendorff's alpha [(krippendorff, 1970)](https://journals.sagepub.com/doi/pdf/10.1177/001316447003000105). We left sentences that got alpha > 0.7. Note that while we found a general agreement between raters about emotion like happy, trust and disgust, there are few emotion with general disagreement about them, apparently given the complexity of finding them in the text (e.g. expectation and surprise).
-### Performance
-#### sentiment analysis
-|              | precision | recall | f1-score |
-|--------------|-----------|--------|----------|
-| 0            | 0.95      | 0.97   | 0.96     |
-| 1            | 0.91      | 0.84   | 0.87     |
-| accuracy     | 0.94      | 305    | 0.92     |
-| macro avg    | 0.93      | 0.91   | 0.92     |
-| weighted avg | 0.94      | 0.94   | 0.94     |

 # THIS IS BETA REPO
 We will release a better one soon:)
 ## HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
 HeBERT is a Hebrew pretrained language model. It is based on Google's BERT architecture and it is BERT-Base config [(Devlin et al. 2018)](https://arxiv.org/abs/1810.04805). <br>
+### HeBert was trained on three dataset:
 1. A Hebrew version of OSCAR [(Ortiz, 2019)](https://oscar-corpus.com/): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
 2. A Hebrew dump of Wikipedia: ~650 MB of data, including over 63 millions words and 3.8 millions sentences
 3. Emotion UGC data that was collected for the purpose of this study. (described below)
 4000 sentences annotated by crowd members (3-10 annotators per sentence) for 8 emotions (anger, disgust, expectation , fear, happy, sadness, surprise and trust) and overall sentiment / polarity<br>
 In order to valid the annotation, we search an agreement between raters to emotion in each sentence using krippendorff's alpha [(krippendorff, 1970)](https://journals.sagepub.com/doi/pdf/10.1177/001316447003000105). We left sentences that got alpha > 0.7. Note that while we found a general agreement between raters about emotion like happy, trust and disgust, there are few emotion with general disagreement about them, apparently given the complexity of finding them in the text (e.g. expectation and surprise).
+## Stay tuned!
+We are still working on our model and will edit this page as we progress.<br>
+Note that we have released only sentiment analysis (polarity) at this point, emotion detection will be released later on.<br>
+our git: https://github.com/avichaychriqui/HeBERT