File size: 4,257 Bytes
e397c50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a8ca07
e397c50
 
 
 
 
0a8ca07
 
e397c50
 
 
 
 
 
 
 
 
 
 
 
ce8c9cd
e397c50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49cb176
e397c50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a8ca07
e397c50
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
language:
- en
library_name: keras
pipeline_tag: image-classification
---

# Model Card for Model ID

This modelcard aims to classify emotions into one of seven categories: anger, happy, sad, fear, surprise, disgust, neutral.

## Model Details

Dataset:

- Train:
Happy - 14,379 / Angry - 7988 / Disgust - 872 / Sad - 9768 / Neutral - 9947 / Fear - 8200 / Surprise - 6376

- Test:
Happy - 3599 / Angry - 1918 / Disgust - 222 / Sad - 2386 / Neutral - 2449 / Fear - 2042 / Surprise - 1628

- Val:
Happy - 2880 / Angry - 1600 / Disgust - 172 / Sad - 1954 / Neutral - 1990 / Fear - 1640 / Surprise - 1628

Model:

1. Transfer learning using MobileNetv2 with 2 additional Dense layers and an output layer with softmax activation function.
2. Used weights to adjust for class imbalances.
3. Total Params: 3,675,823
4. Trainable Params: 136,839
5. Accuracy: 0.823 | Precision: 0.825 | Recall: 0.823 | F1: 0.821

## Room for Improvement:

This model was created with extremely limited hardware acceleration (GPU) resources. Therefore, it is high likely that evaluation metrics that surpass the 95% mark can be achieved in the following manner:

1. MobileNetv2 was used for its fast inference and low latency but perhaps, with more resources, a more suitable base model can be found.
2. Data augmentation in order to better correct for class imbalances.
3. Using learning rate decay to train for longer (with lower LR) after nearing local minima (aprox 60 epochs).
4. Error Analysis


## Uses

Cannot be used for commercial purposes in the EU.

### Direct Use

Combine with the Open CV haar casacade for face detection.

## How to Get Started with the Model

Use the script below to get started with the model locally on your device's camera:

    import cv2
    import numpy as np
    import tensorflow as tf
    
    def display_emotion(frame, model):
        font = cv2.FONT_HERSHEY_SIMPLEX
        font_scale = 1.5
        text_color = (0, 0, 255)
        x, y, w, h = 0, 0, 175, 75
    
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
        faces = face_cascade.detectMultiScale(gray, 1.1, 4)
    
        for x, y, w, h in faces:
            roi_gray = gray[y:y+h, x:x+w]
            roi_color = frame[y:y+h, x:x+w]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)  # Green square
            faces = face_cascade.detectMultiScale(roi_gray)
    
            if len(faces) == 0:
                print("Face not detected...")
            else:
                for (ex, ey, ew, eh) in faces:
                    face_roi = roi_color[ey:ey+eh, ex:ex+ew]
    
            resized_image = cv2.resize(face_roi, (224, 224))
            final_image = np.expand_dims(resized_image, axis=0)
    
            predictions = model.predict(final_image)
            class_labels = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
            predicted_label = class_labels[np.argmax(predictions)]
    
            # Black background rectangle
            cv2.rectangle(frame, (x, y), (x+w, y-25), (0, 0, 0), -1)
            # Add text
            cv2.putText(frame, predicted_label, (x, y-10), font, 0.7, text_color, 2)
            cv2.rectangle(frame, (x, y), (x+w, y+h), text_color)
    
        return frame
    
    def main():
        model = tf.keras.models.load_model('emotion_detection.keras')
        cap = cv2.VideoCapture(1)
    
        if not cap.isOpened():
            cap = cv2.VideoCapture(0)
        if not cap.isOpened():
            raise IOError("Cannot open webcam")
    
        while True:
            ret, frame = cap.read()
            if not ret:
                break
    
            frame = display_emotion(frame, model)
            cv2.imshow('Facial Expression Recognition', frame)
    
            if cv2.waitKey(2) & 0xFF == ord('q'):
                break
    
        cap.release()
        cv2.destroyAllWindows()
    
    if __name__ == "__main__":
        main()





#### Preprocessing [optional]

MobileNetv2 recieves image inputs of size (224, 224)

#### Speeds, Sizes, Times [optional]

Latency (local demo, no GPU): 39 ms/step

## Model Card Authors [optional]

Ronny Nehme