---
language:
- en
library_name: keras
pipeline_tag: image-classification
---

# Model Card for Model ID

This modelcard aims to classify emotions into one of seven categories: anger, happy, sad, fear, surprise, disgust, neutral.

## Model Details

Dataset:

- Train:
Happy - 14,379 / Angry - 7988 / Disgust - 872 / Sad - 9768 / Neutral - 9947 / Fear - 8200 / Surprise - 6376

- Test:
Happy - 3599 / Angry - 1918 / Disgust - 222 / Sad - 2386 / Neutral - 2449 / Fear - 2042 / Surprise - 1628

- Val:
Happy - 2880 / Angry - 1600 / Disgust - 172 / Sad - 1954 / Neutral - 1990 / Fear - 1640 / Surprise - 1628

Model:

1. Transfer learning using MobileNetv2 with 2 additional Dense layers and an output layer with softmax activation function.
2. Used weights to adjust for class imbalances.
3. Total Params: 3,675,823
4. Trainable Params: 136,839
5. Accuracy: 0.823 | Precision: 0.825 | Recall: 0.823 | F1: 0.821

Room for Improvement:

This model was created with extremely limited hardware acceleration (GPU) resources. Therefore, it is high likely that evaluation metrics that surpass the 95% mark can be achieved in the following manner:

1. MobileNetv2 was used for its fast inference and low latency but perhaps, with more resources, a more suitable base model can be found.
2. Data augmentation in order to better correct for class imbalances.
3. Using a learning rate scheduler to train for longer (with lower LR) after nearing local minima (aprox 60 epochs).


## Uses

Cannot be used for commercial purposes in the EU.

### Direct Use

Combine with the Open CV haar casacade for face detection.

## How to Get Started with the Model

Use the code below to get started with the model locally:

    import cv2
    import numpy as np
    import tensorflow as tf
    
    def display_emotion(frame, model):
        font = cv2.FONT_HERSHEY_SIMPLEX
        font_scale = 1.5
        text_color = (0, 0, 255)
        x, y, w, h = 0, 0, 175, 75
    
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
        faces = face_cascade.detectMultiScale(gray, 1.1, 4)
    
        for x, y, w, h in faces:
            roi_gray = gray[y:y+h, x:x+w]
            roi_color = frame[y:y+h, x:x+w]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)  # Green square
            faces = face_cascade.detectMultiScale(roi_gray)
    
            if len(faces) == 0:
                print("Face not detected...")
            else:
                for (ex, ey, ew, eh) in faces:
                    face_roi = roi_color[ey:ey+eh, ex:ex+ew]
    
            resized_image = cv2.resize(face_roi, (224, 224))
            final_image = np.expand_dims(resized_image, axis=0)
    
            predictions = model.predict(final_image)
            class_labels = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
            predicted_label = class_labels[np.argmax(predictions)]
    
            # Black background rectangle
            cv2.rectangle(frame, (x, y), (x+w, y-25), (0, 0, 0), -1)
            # Add text
            cv2.putText(frame, predicted_label, (x, y-10), font, 0.7, text_color, 2)
            cv2.rectangle(frame, (x, y), (x+w, y+h), text_color)
    
        return frame
    
    def main():
        model = tf.keras.models.load_model('best_model.keras')
        cap = cv2.VideoCapture(1)
    
        if not cap.isOpened():
            cap = cv2.VideoCapture(0)
        if not cap.isOpened():
            raise IOError("Cannot open webcam")
    
        while True:
            ret, frame = cap.read()
            if not ret:
                break
    
            frame = display_emotion(frame, model)
            cv2.imshow('Facial Expression Recognition', frame)
    
            if cv2.waitKey(2) & 0xFF == ord('q'):
                break
    
        cap.release()
        cv2.destroyAllWindows()
    
    if __name__ == "__main__":
        main()


### Training Data

Dataset used: FER (available on Kaggle)

#### Preprocessing [optional]

MobileNetv2 recieves image inputs of size (224, 224)

#### Speeds, Sizes, Times [optional]

Latency (local demo, no GPU): 39 ms/step

## Model Card Authors [optional]

Ronny Nehme