FelixTheWhale commited on
Commit
546a528
Β·
verified Β·
1 Parent(s): 1d40e39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google/gemma-3-1b-it
7
+ ---
8
+ # Emma-3-1B: Emotionally Modulated Gemma-3
9
+
10
+ **Model Author:** FelixTheWhale
11
+
12
+ ## Model Description
13
+
14
+ **Emma-3-1B** (Emotional Gemma 3) is an experimental implementation exploring emotional modulation within the Gemma-3 LLM architecture. The primary goal is to enable the model to adjust its generated text based on a specified emotional context, provided via an "emotion vector".
15
+
16
+ While it demonstrates the capability for some emotional modulation, this model primarily serves as a exploration of emotional states in transformer models.
17
+
18
+ ### Emotion Representation
19
+
20
+ **8 emotion dimensions**:
21
+
22
+ * SADNESS ↔ JOY
23
+ * FEAR ↔ COURAGE
24
+ * DISGUST ↔ ACCEPTANCE
25
+ * ANGER ↔ CALMNESS
26
+ * SURPRISE ↔ EXPECTATION
27
+ * DISTRUST ↔ TRUST
28
+ * BOREDOM ↔ INTEREST
29
+ * INDIFFERENCE ↔ EMPATHY
30
+
31
+ Each dimension is represented by a value (e.g., between -1 and 1), forming an 8-dimensional vector input.
32
+
33
+ ## How it Works: Architecture
34
+
35
+ 1. **Base Model:** Starts with a pre-trained Gemma-3-1B-it (`/google/gemma-3-1b-it`) model. Also may work with other models with adjustments in forward().
36
+ 2. **Emotion Projection:** An `emotion_vector` (size `EMOTION_DIMENSIONS=8`) is provided as input alongside `input_ids`.
37
+ 3. **Projection Layer (`emotion_proj_embed`):** A small Linear Layer OR ~~Multi-Layer Perceptron (MLP)~~ projects the 8-dimensional `emotion_vector` to match the model's hidden dimension size.
38
+ 4. **Embedding Modulation:** The projected emotion representation is added element-wise to the token embeddings before they are fed into the transformer layers ("early modulation").
39
+ 5. **Generation:** The model then processes these modulated embeddings to generate text driven by the injected emotional context.
40
+
41
+ *(Note: The model class inherits from `transformers.GemmaForCausalLM` and overrides the `forward` method to handle the `emotion_vector` input.)*
42
+
43
+ ## Training
44
+
45
+ * **Fine-tuning:** The model was fine-tuned using Parameter-Efficient Fine-Tuning (PEFT), specifically LoRA (Low-Rank Adaptation). Only the LORA adapters and the `emotion_proj_embed` layer were trained.
46
+ * **Dataset:** Trained on a small custom dataset of short (128 tokens) text sequences paired with corresponding 8-dimensional emotion vectors.
47
+ * **Optimizer:** A custom optimizer configuration was used, applying different LR to the `emotion_proj_embed` parameters versus the PEFT adapters.
48
+ * **Data Collator:** A custom `DataCollatorForEmotionalGemma` handles batching and padding of `input_ids`, `attention_mask`, and `emotion_vectors`.