Hebrew
vad
valence
arousal
dominance
regression
knesset
File size: 3,242 Bytes
ee6abae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f630722
 
 
 
234672c
 
 
591fb7a
f630722
 
 
 
 
 
 
ee6abae
 
 
f630722
 
 
 
 
 
 
 
 
 
 
 
 
736a2ab
 
f630722
 
 
 
a9a9b78
f630722
 
a9a9b78
 
8d9403a
 
a9a9b78
 
 
f630722
 
 
 
 
591fb7a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---

license: cc-by-sa-4.0
datasets:
- GiliGold/VAD_KnessetCorpus
- HaifaCLGroup/KnessetCorpus
- GiliGold/Hebrew_VAD_lexicon
language:
- he
tags:
- vad
- valence
- arousal
- dominance
- regression
- knesset
---

# VAD Binomial Regression Models
This repository contains three binomial regression models designed to predict VAD (Valence, Arousal, Dominance) scores for text inputs. 
Each model is stored as a separate pickle (.pkl) file:

- **valence_model.pkl**: Predicts the Valence score (positivity/negativity).
- **arousal_model.pkl**: Predicts the Arousal score (level of excitement or calm).
- **dominance_model.pkl**: Predicts the Dominance score (sense of control or influence).

All scores are normalized on a scale from 0 to 1.

Before making predictions, input text must be converted into embeddings using the [Knesset-multi-e5-large](https://huggingface.co/GiliGold/Knesset-multi-e5-large) model. The embeddings are then fed into the regression models.

## Training Data
The models were trained using a combination of datasets to ensure robust and generalizable predictions:

- A Hebrew version of the [Emobank Dataset](https://aclanthology.org/E17-2092/) (by buechel-hahn-2017-emobank): A comprehensive dataset containing emotional text data that we automaticaly translated to Hebrew using [Google/madlad400-3b-mt](https://huggingface.co/google/madlad400-3b-mt).
- [Hebrew VAD Lexicon](https://huggingface.co/datasets/GiliGold/Hebrew_VAD_lexicon): A lexicon that provides VAD scores for Hebrew words.
- [Knesset Sentences](https://huggingface.co/datasets/GiliGold/VAD_KnessetCorpus): A manually annotated set of 120 Knesset sentences with VAD scores, serving as an additional benchmark and source of training data.
This diverse training data allowed the models to capture nuanced emotional features across different text domains, especially in Hebrew.

## Model Details
- Model Type: Binomial Regression
- Input: Preprocessed text data (the specific feature extraction process should align with the training procedure).
- Output: VAD scores (valence, arousal, and dominance) on a continuous scale from 0 to 1.
Each model is provided as a .pkl file and can be loaded using Python's pickle module.

## Usage Example
```python
from sentence_transformers import SentenceTransformer
import pickle

sentence = "ื–ื” ืžืฉืคื˜ ืœื“ื•ื’ืžื”"
# Convert input text into embeddings using Knesset-multi-e5-large
model = SentenceTransformer('GiliGold/Knesset-multi-e5-large')
embedding_vector = model.encode(sentence)

# Load the valence model
#Option 1: Manually download files from https://huggingface.co/GiliGold/VAD_binomial_regression_models/tree/main)
with open("valence_model.pkl", "rb") as file:
    valence_model = pickle.load(file)

#Option 2: Download using Hugging Face hub
from huggingface_hub import hf_hub_download
repo_id = "GiliGold/VAD_binomial_regression_models"
model_v_path  = hf_hub_download(repo_id=repo_id, filename="valence_model.pkl")
with open(model_v_path, "rb") as f:
    valence_model = pickle.load(f)

# Assume `embedding_vector` is the vector obtained from the Knesset-multi model
valence_score = valence_model.predict([embedding_vector])

print(f"Predicted Valence Score: {valence_score[0]}")
```