Hebrew
vad
valence
arousal
dominance
regression
knesset
GiliGold commited on
Commit
f630722
·
verified ·
1 Parent(s): 9834f42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ ---
4
+ # VAD Binomial Regression Models
5
+ This repository contains three binomial regression models designed to predict VAD (Valence, Arousal, Dominance) scores for text inputs.
6
+ Each model is stored as a separate pickle (.pkl) file:
7
+
8
+ valence_model.pkl: Predicts the valence score (positivity/negativity).
9
+ arousal_model.pkl: Predicts the arousal score (level of excitement or calm).
10
+ dominance_model.pkl: Predicts the dominance score (sense of control or influence).
11
+ All scores are normalized on a scale from 0 to 1.
12
+
13
+ Before making predictions, input text must be converted into embeddings using the [Knesset-multi-e5-large](https://huggingface.co/GiliGold/Knesset-multi-e5-large) model. The embeddings are then fed into the regression models.
14
+
15
+ ## Training Data
16
+ The models were trained using a combination of datasets to ensure robust and generalizable predictions:
17
+
18
+ [Emobank Dataset](https://aclanthology.org/E17-2092/) (by buechel-hahn-2017-emobank): A comprehensive dataset containing emotional text data that we automaticaly translated to Hebrew using [Google/madlad400-3b-mt](https://huggingface.co/google/madlad400-3b-mt).
19
+ [Hebrew VAD Lexicon](https://huggingface.co/datasets/GiliGold/Hebrew_VAD_lexicon): A lexicon that provides VAD scores for Hebrew words.
20
+ [Knesset Sentences](https://huggingface.co/datasets/GiliGold/VAD_KnessetCorpus): A manually annotated set of 120 Knesset sentences with VAD scores, serving as an additional benchmark and source of training data.
21
+ This diverse training data allowed the models to capture nuanced emotional features across different text domains, especially in Hebrew.
22
+
23
+ ## Model Details
24
+ - Model Type: Binomial Regression
25
+ - Input: Preprocessed text data (the specific feature extraction process should align with the training procedure).
26
+ - Output: VAD scores (valence, arousal, and dominance) on a continuous scale from 0 to 1.
27
+ Each model is provided as a .pkl file and can be loaded using Python's pickle module.
28
+
29
+ ## Usage Example
30
+ ```python
31
+ from sentence_transformers import SentenceTransformer
32
+ import pickle
33
+
34
+ model = SentenceTransformer('GiliGold/Knesset-multi-e5-large')
35
+ embedding_vector = model.encode(sentence)
36
+
37
+ # Load the valence model
38
+ with open("valence_model.pkl", "rb") as file:
39
+ valence_model = pickle.load(file)
40
+
41
+ # Assume `embedding_vector` is the vector obtained from the Knesset-multi model
42
+ valence_score = valence_model.predict([embedding_vector])
43
+
44
+ print(f"Predicted Valence Score: {valence_score[0]}")
45
+ ```
46
+