{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FyRdDYkqAKN4" }, "source": [ "# BISINDO Sign Language Translator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Description\n", "\n", "This project is a final project for \"Rekayasa Sistem Berbasis Pengetahuan (RSBP)\". This project is a web-based application that can translate BISINDO sign language to Indonesian language. This application is made to help people who are not familiar with BISINDO sign language to communicate with people who are deaf.
\n", "\n", "## Appoarch Method\n", "\n", "This project employs the Object Detection method to identify hand gestures, utilizing a customized YOLO v8 dataset model. This model is designed to detect various hand gestures and provide the bounding box coordinates delineating these gestures. Once the bounding box is identified, the corresponding region of interest (ROI) containing the hand gesture is cropped. Subsequently, this cropped image undergoes further processing through a Convolutional Neural Network (CNN) model for precise classification of the detected hand gesture. The CNN model's output yields the specific label corresponding to the gesture identified. To enhance user experience, these labels are translated into Indonesian using a Rule-Based method before being displayed on the screen interface.
\n", "\n", "In terms of visualization, the project harnesses MediaPipe, a tool enabling the drawing of hand gesture bounding boxes and their associated labels directly on the screen. This integration ensures users can visually comprehend the detected gestures in real-time with graphical representations.
\n", "\n", "Moreover, the project is built as a web-based application using Flask, allowing seamless accessibility and interaction through a web interface. This framework facilitates the deployment of the hand gesture detection system within a browser, ensuring ease of use across various devices without necessitating complex setups or installations.
\n", "\n", "## Visual Demo\n", "\n", "### Dashboard\n", "![dashboard](../img/image.png)
\n", "\n", "### Stream Youtube Detection\n", "![yt](../img/image-2.png)
\n", "\n", "### Camera Detection\n", "![run](../img/image-1.png)
\n", "\n", "## Step by Step Method\n", "- Train the model using YOLOv8 with custom dataset\n", "- Create a CNN model to classify the cropped image\n", "- Create a Flask app to run the model\n", "- Create a web interface using HTML and CSS\n", "- Create feature interaction using Javascript \n", "\n", "## Features\n", "- Detect hand gesture from Camera\n", "- Detect hand gesture from Youtube video\n", "- Switch on/off the detection\n", "- Switch on/off the landmark\n", "- Flip the video\n", "- Modify Confidence Threshold\n", "- The result accumulates the detected words over 10 sequential frames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Training Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check GPU" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Y8cDtxLIBHgQ", "outputId": "d2185ab5-0f43-47b6-b675-937ee56331d5" }, "outputs": [], "source": [ "!nvidia-smi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import OS" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CjpPg4mGKc1v", "outputId": "c0bb0d9a-2cd3-4bea-d38a-070b74a5a89b" }, "outputs": [], "source": [ "import os\n", "HOME = os.getcwd()\n", "print(HOME)" ] }, { "cell_type": "markdown", "metadata": { "id": "3C3EO_2zNChu" }, "source": [ "### Yolo Instalation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "tdSMcABDNKW-", "outputId": "6193e25a-2cfd-4d96-fe9f-769add6166f7" }, "outputs": [], "source": [ "%pip install ultralytics==8.0.20\n", "\n", "from IPython import display\n", "display.clear_output()\n", "\n", "import ultralytics\n", "ultralytics.checks()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "VOEYrlBoP9-E" }, "outputs": [], "source": [ "from ultralytics import YOLO\n", "\n", "from IPython.display import display, Image" ] }, { "cell_type": "markdown", "metadata": { "id": "fT1qD4toTTw0" }, "source": [ "### Import Dataset Form Roboflow" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "BSd93ZJzZZKt", "outputId": "eb8bcd99-cea8-48a1-bf11-7a37f2aa7018" }, "outputs": [], "source": [ "%mkdir {HOME}/datasets\n", "%cd {HOME}/datasets\n", "\n", "%pip install roboflow --quiet\n", "\n", "from roboflow import Roboflow\n", "\n", "from roboflow import Roboflow\n", "rf = Roboflow(api_key=\"moOAxzoPZOtIzhyyco0r\")\n", "project = rf.workspace(\"bisindo-qndjb\").project(\"bisindov2\")\n", "dataset = project.version(1).download(\"yolov8\")" ] }, { "cell_type": "markdown", "metadata": { "id": "YUjFBKKqXa-u" }, "source": [ "### Training" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "D2YkphuiaE7_", "outputId": "7f5ea4f9-6e7d-4470-8209-9a253786cd13" }, "outputs": [], "source": [ "%cd {HOME}\n", "\n", "yolo task=detect mode=train model=yolov8s.pt data={dataset.location}/data.yaml epochs=25 imgsz=800 plots=True" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1MScstfHhArr", "outputId": "7865d222-af3a-49bd-9aa7-3816872815c9" }, "outputs": [], "source": [ "%ls {HOME}/runs/detect/train/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 485 }, "id": "_J35i8Ofhjxa", "outputId": "762ae444-f7c9-42f5-cd2c-8ea7facee223" }, "outputs": [], "source": [ "%cd {HOME}\n", "Image(filename=f'{HOME}/runs/detect/train/confusion_matrix.png', width=600)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 335 }, "id": "A-urTWUkhRmn", "outputId": "58b67f20-8844-43f4-ff41-c8c498036fc8" }, "outputs": [], "source": [ "%cd {HOME}\n", "Image(filename=f'{HOME}/runs/detect/train/results.png', width=600)" ] }, { "cell_type": "markdown", "metadata": { "id": "6ODk1VTlevxn" }, "source": [ "### Validate Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YpyuwrNlXc1P", "outputId": "88f5e42d-e10d-45db-c578-7c85f8667c10" }, "outputs": [], "source": [ "%cd {HOME}\n", "\n", "!yolo task=detect mode=val model={HOME}/runs/detect/train/weights/best.pt data={dataset.location}/data.yaml" ] }, { "cell_type": "markdown", "metadata": { "id": "i4eASbcWkQBq" }, "source": [ "### Inference Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Wjc1ctZykYuf", "outputId": "f400774d-404b-4fef-ef58-796fc4fd522e" }, "outputs": [], "source": [ "%cd {HOME}\n", "!yolo task=detect mode=predict model={HOME}/runs/detect/train/weights/best.pt conf=0.25 source={dataset.location}/test/images save=True" ] }, { "cell_type": "markdown", "metadata": { "id": "mEYIo95n-I0S" }, "source": [ "### Run Model on CAMERA" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "As_cl3OkQWQ1", "outputId": "759e88f3-c3b8-474f-d217-35765bd81d06" }, "outputs": [], "source": [ "%cd {HOME}\n", "!yolo task=detect mode=predict model=D:/Kuliah/\\(2023\\)S5-RSBP/FP/runs/detect/train/weights/best.pt source=0 show=true" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training Result\n", "\n", "### Confussion Matrix\n", "![dashboard](../img/confussion_matrix.png)
\n", "\n", "### Result Curve\n", "![dashboard](../img/result.png)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Flask Web Application" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install Requirements Dependencies\n", "To install the Python requirements from the `requirements.txt` file, run the following command in your terminal or command prompt:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pip install -r requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create new app.py file for main program\n", "Import libray that needed for this project" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ultralytics import YOLO\n", "import time\n", "import numpy as np\n", "import mediapipe as mp\n", "\n", "\n", "import cv2\n", "from flask import Flask, render_template, request, Response, session, redirect, url_for\n", "\n", "from flask_socketio import SocketIO\n", "import yt_dlp as youtube_dl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explanation for each library:\n", "- ultralytics.YOLO: Used for real-time object detection.\n", "- time: Handles time-related functions.\n", "- numpy as np: Supports numerical computations and arrays.\n", "- mediapipe as mp: Facilitates various media processing tasks like object detection and hand tracking.\n", "- cv2: Offers tools for computer vision, image, and video processing.\n", "- Flask: A lightweight web framework for building web applications.\n", "- Flask_socketio and SocketIO: Enables WebSocket support for real-time communication in Flask.\n", "- yt_dlp as youtube_dl: Used to stream media content from various streaming sites, like YouTube." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialize Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_object_detection = YOLO(\"bisindo.pt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Function for Detection" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def show(self, url):\n", " print(url)\n", " self._preview = False\n", " self._flipH = False\n", " self._detect = False\n", " self._mediaPipe = False\n", "\n", " self._confidence = 75.0\n", " ydl_opts = {\n", " \"quiet\": True,\n", " \"no_warnings\": True,\n", " \"format\": \"best\",\n", " \"forceurl\": True,\n", " }\n", "\n", " if url == '0':\n", " cap = cv2.VideoCapture(0)\n", " else:\n", " \n", " ydl = youtube_dl.YoutubeDL(ydl_opts)\n", "\n", " info = ydl.extract_info(url, download=False)\n", " url = info[\"url\"]\n", "\n", " cap = cv2.VideoCapture(url)\n", "\n", " while True:\n", " if self._preview:\n", " if stop_flag:\n", " print(\"Process Stopped\")\n", " return\n", "\n", " grabbed, frame = cap.read()\n", " if not grabbed:\n", " break\n", " if self.flipH:\n", " frame = cv2.flip(frame, 1)\n", "\n", " if self.detect:\n", " frame_yolo = frame.copy()\n", " results_yolo = model_object_detection.predict(frame_yolo, conf=self._confidence / 100)\n", "\n", " frame_yolo, labels = results_yolo[0].plot()\n", " list_labels = []\n", " # labels_confidences\n", " for label in labels:\n", " confidence = label.split(\" \")[-1]\n", " label = (label.split(\" \"))[:-1]\n", " label = \" \".join(label)\n", " list_labels.append(label)\n", " list_labels.append(confidence)\n", " socketio.emit('label', list_labels)\n", "\n", " if self.mediaPipe:\n", " # Convert the image to RGB for processing with MediaPipe\n", " image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n", " results = self.hands.process(image)\n", " \n", " if results.multi_hand_landmarks:\n", " for hand_landmarks in results.multi_hand_landmarks:\n", " mp.solutions.drawing_utils.draw_landmarks(\n", " frame,\n", " hand_landmarks,\n", " self.mp_hands.HAND_CONNECTIONS,\n", " landmark_drawing_spec=mp.solutions.drawing_utils.DrawingSpec(color=(255, 0, 0), thickness=4, circle_radius=3),\n", " connection_drawing_spec=mp.solutions.drawing_utils.DrawingSpec(color=(255, 255, 255), thickness=2, circle_radius=2), \n", " )\n", "\n", " frame = cv2.imencode(\".jpg\", frame)[1].tobytes()\n", " yield ( \n", " b'--frame\\r\\n'\n", " b'Content-Type: image/jpeg\\r\\n\\r\\n' + frame + b'\\r\\n'\n", " )\n", " else:\n", " snap = np.zeros((\n", " 1000,\n", " 1000\n", " ), np.uint8)\n", " label = \"Streaming Off\"\n", " H, W = snap.shape\n", " font = cv2.FONT_HERSHEY_PLAIN\n", " color = (255, 255, 255)\n", " cv2.putText(snap, label, (W//2 - 100, H//2),\n", " font, 2, color, 2)\n", " frame = cv2.imencode(\".jpg\", snap)[1].tobytes()\n", " yield (b'--frame\\r\\n'\n", " b'Content-Type: image/jpeg\\r\\n\\r\\n' + frame + b'\\r\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Explanation for each object:\n", "- preview(): Displays the video stream on the web interface.\n", "- flipH(): Flips the video horizontally.\n", "- detect(): Detects using Yolo model the hand gesture from the video stream.\n", "- mediaPipe(): Draws the hand gesture landmark on the video stream.\n", "\n", "### Explanation for flow of the program:\n", "- If user input is \"camera\", the program will run the camera detection.\n", "- If user input is \"url youtube video\", the program will run the youtube detection.\n", "- If user activate the preview, the program will run the video stream/camera.\n", "- If user activate the detection, the program will run the detection.\n", "- If user activate the landmark, the program will run the landmark.\n", "- If user activate the flip, video will be flipped.\n", "- Threshold is used to set the minimum confidence threshold of the detection.\n", "- If the preview is not activated, the program will show `streaming off` label.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Integration in HTML, CSS, and Javascript" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In CSS file, create a style for the stream and output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "/* * Local selectors */\n", "#container {\n", " width: 100%;\n", " height: 586px;\n", " border: 8px #2c374a solid;\n", " background-color: #0F172A;\n", " border-radius: 5px;\n", "}\n", "\n", "#videoElement {\n", " height: 570px;\n", " width: 100%;\n", " background-color: #0F172A;\n", "\n", " display: block;\n", " margin-left: auto;\n", " margin-right: auto;\n", "}\n", "\n", "#terminal {\n", " border-radius: 5px;\n", " border: 5px #1C2637 solid;\n", " font-family: monospace;\n", " font-size: 12px;\n", " background-color: #0F172A;\n", " height: 490px;\n", " overflow-y: scroll;\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In Javascript, create a function needed for the web interface" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Camera or Video Button" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function startCamera() {\n", " var url = '0';\n", " $('#urlForm').attr('action', '/index'); \n", " $('#urlForm').attr('method', 'POST'); \n", " $('#urlForm').find('#url').val(url);\n", " $('#urlForm').submit();\n", "}\n", "\n", "function startVideo() {\n", " var url = $('#url').val();\n", " $('#urlForm').attr('action', '/index'); \n", " $('#urlForm').attr('method', 'POST'); \n", " $('#urlForm').find('#url').val(url);\n", " $('#urlForm').submit();\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For terminal output, socket, and final output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var socket = io.connect('http://127.0.0.1:5000/');\n", "\n", "let consecutiveWords = [];\n", "let finalSentence = \"\";\n", "let wordCounter = 0;\n", "\n", "function appendToTerminal(message) {\n", " var terminal = document.getElementById(\"terminal\");\n", " var p = document.createElement(\"p\");\n", " p.innerHTML = `\n", " \n", " \n", " \n", " \n", "
${message[0]}${message[1]}
`;\n", " terminal.appendChild(p);\n", " terminal.scrollTop = terminal.scrollHeight;\n", "\n", " if (consecutiveWords.length === 0 || consecutiveWords[consecutiveWords.length - 1] === message[0]) {\n", " consecutiveWords.push(message[0]);\n", " wordCounter++; \n", " } else {\n", " consecutiveWords = [message[0]];\n", " wordCounter = 1;\n", " }\n", "\n", " if (wordCounter >= 10) {\n", " finalSentence += (finalSentence.length > 0 ? \" \" : \"\") + consecutiveWords[0];\n", " document.getElementById(\"finalSentencePara\").innerText = finalSentence;\n", " consecutiveWords = [];\n", " wordCounter = 0;\n", " }\n", "}\n", "\n", "socket.on(\"label\", (data) => {\n", " appendToTerminal(data);\n", "});" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Integration with SocketIO:\n", "\n", "- Listens for data labeled as \"label\" from a SocketIO connection.\n", "- Calls appendToTerminal() to display the received data in the terminal and potentially update an advertisement based on the data.\n", "\n", "Function appendToTerminal(message):\n", "\n", "- Takes a message as input.\n", "- Adds a table with two columns to the terminal for displaying the message.\n", "- Keeps track of consecutive words and their counts.\n", "- Constructs a final sentence if a word appears more than ten times consecutively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Toggle Button" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function toggleHSwitch() {\n", " var switchElement = $(\"#flip-horizontal\");\n", " var switchIsOn = switchElement.is(\":checked\");\n", "\n", " if (switchIsOn) {\n", " console.log(\"SWITCH ON\")\n", " $.getJSON(\"/request_flipH_switch\", function (data) {\n", " console.log(\"Switch on request sent.\");\n", " });\n", " } else {\n", " console.log(\"SWITCH OFF\")\n", " $.getJSON(\"/request_flipH_switch\", function (data) {\n", " console.log(\"Switch off request sent.\");\n", " });\n", " }\n", "}\n", "\n", "function toggleMediaPipeSwitch() {\n", " var switchElement = $(\"#mediapipe\");\n", " var switchIsOn = switchElement.is(\":checked\");\n", "\n", " if (switchIsOn) {\n", " console.log(\"SWITCH ON\")\n", " $.getJSON(\"/request_mediapipe_switch\", function (data) {\n", " console.log(\"Switch on request sent.\");\n", " });\n", " } else {\n", " console.log(\"SWITCH OFF\")\n", " $.getJSON(\"/request_mediapipe_switch\", function (data) {\n", " console.log(\"Switch off request sent.\");\n", " });\n", " }\n", "}\n", "\n", "function toggleDetSwitch() {\n", "\n", " var switchElement = $(\"#run_detection\");\n", " var switchIsOn = switchElement.is(\":checked\");\n", "\n", " if (switchIsOn) {\n", " console.log(\"SWITCH ON\")\n", " $.getJSON(\"/request_run_model_switch\", function (data) {\n", " console.log(\"Switch on request sent.\");\n", " });\n", " } else {\n", " console.log(\"SWITCH OFF\")\n", " $.getJSON(\"/request_run_model_switch\", function (data) {\n", " console.log(\"Switch off request sent.\");\n", " });\n", " }\n", "}\n", "\n", "function toggleOffSwitch() {\n", " var switchElement = $(\"#turn_off\");\n", " var switchIsOn = switchElement.is(\":checked\");\n", "\n", " if (switchIsOn) {\n", " console.log(\"Camera ON\")\n", " $.getJSON(\"/request_preview_switch\", function (data) {\n", " console.log(\"Switch on request sent.\");\n", " });\n", " } else {\n", " console.log(\"Camera OFF\")\n", " $.getJSON(\"/request_preview_switch\", function (data) {\n", " console.log(\"Switch off request sent.\");\n", " });\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For HTML, integrate the Javascript function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Camera and Terminal Output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "
\n", "
\n", " \n", "
\n", "
\n", "\n", "\n", "
\n", "

Output

\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For toggle switch" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "\n", "
\n", "
\n", " \n", " \n", "
\n", " \n", " 75\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Final Output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "
\n", "

\n", "

\n", "
" ] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 0 }