|
---
|
|
license: cc-by-sa-4.0
|
|
datasets:
|
|
- GiliGold/VAD_KnessetCorpus
|
|
- HaifaCLGroup/KnessetCorpus
|
|
- GiliGold/Hebrew_VAD_lexicon
|
|
language:
|
|
- he
|
|
tags:
|
|
- vad
|
|
- valence
|
|
- arousal
|
|
- dominance
|
|
- regression
|
|
- knesset
|
|
---
|
|
# VAD Binomial Regression Models |
|
This repository contains three binomial regression models designed to predict VAD (Valence, Arousal, Dominance) scores for text inputs. |
|
Each model is stored as a separate pickle (.pkl) file: |
|
|
|
- **valence_model.pkl**: Predicts the Valence score (positivity/negativity). |
|
- **arousal_model.pkl**: Predicts the Arousal score (level of excitement or calm). |
|
- **dominance_model.pkl**: Predicts the Dominance score (sense of control or influence). |
|
|
|
All scores are normalized on a scale from 0 to 1. |
|
|
|
Before making predictions, input text must be converted into embeddings using the [Knesset-multi-e5-large](https://huggingface.co/GiliGold/Knesset-multi-e5-large) model. The embeddings are then fed into the regression models. |
|
|
|
## Training Data |
|
The models were trained using a combination of datasets to ensure robust and generalizable predictions: |
|
|
|
- A Hebrew version of the [Emobank Dataset](https://aclanthology.org/E17-2092/) (by buechel-hahn-2017-emobank): A comprehensive dataset containing emotional text data that we automaticaly translated to Hebrew using [Google/madlad400-3b-mt](https://huggingface.co/google/madlad400-3b-mt). |
|
- [Hebrew VAD Lexicon](https://huggingface.co/datasets/GiliGold/Hebrew_VAD_lexicon): A lexicon that provides VAD scores for Hebrew words. |
|
- [Knesset Sentences](https://huggingface.co/datasets/GiliGold/VAD_KnessetCorpus): A manually annotated set of 120 Knesset sentences with VAD scores, serving as an additional benchmark and source of training data. |
|
This diverse training data allowed the models to capture nuanced emotional features across different text domains, especially in Hebrew. |
|
|
|
## Model Details |
|
- Model Type: Binomial Regression |
|
- Input: Preprocessed text data (the specific feature extraction process should align with the training procedure). |
|
- Output: VAD scores (valence, arousal, and dominance) on a continuous scale from 0 to 1. |
|
Each model is provided as a .pkl file and can be loaded using Python's pickle module. |
|
|
|
## Usage Example |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
import pickle |
|
|
|
sentence = "ืื ืืฉืคื ืืืืืื" |
|
# Convert input text into embeddings using Knesset-multi-e5-large |
|
model = SentenceTransformer('GiliGold/Knesset-multi-e5-large') |
|
embedding_vector = model.encode(sentence) |
|
|
|
# Load the valence model |
|
#Option 1: Manually download files from https://huggingface.co/GiliGold/VAD_binomial_regression_models/tree/main) |
|
with open("valence_model.pkl", "rb") as file: |
|
valence_model = pickle.load(file) |
|
|
|
#Option 2: Download using Hugging Face hub |
|
from huggingface_hub import hf_hub_download |
|
repo_id = "GiliGold/VAD_binomial_regression_models" |
|
model_v_path = hf_hub_download(repo_id=repo_id, filename="valence_model.pkl") |
|
with open(model_v_path, "rb") as f: |
|
valence_model = pickle.load(f) |
|
|
|
# Assume `embedding_vector` is the vector obtained from the Knesset-multi model |
|
valence_score = valence_model.predict([embedding_vector]) |
|
|
|
print(f"Predicted Valence Score: {valence_score[0]}") |
|
``` |