Spaces:

Ordenador
/

classify-text-with-bert-hate-speech

Runtime error

File size: 5,110 Bytes

332ee8a

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "gpuClass": "standard"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Implementation of text classification with BERT\n",
        "\n",
        "\n",
        "Still Working on it.\n",
        "\n",
        "This notebook is based in this TensorFlow tutorial: [Classify text with BERT](https://www.tensorflow.org/tutorials/text/classify_text_wibert)\n",
        "\n",
        "BERT [(article link)](https://arxiv.org/abs/1810.04805) and other Transformer encoder architectures have been wildly successful on a variety of tasks in NLP (natural language processing). They compute vector-space representations of natural language that are suitable for use in deep learning models.\n",
        "\n",
        "![](http://www.d2l.ai/_images/nlp-map-pretrain.svg):\n",
        "\n",
        "Source: http://www.d2l.ai/chapter_natural-language-processing-pretraining/index.html\n",
        "\n",
        "BERT models are usually pre-trained on a large corpus of text, then fine-tuned for specific tasks.\n",
        "\n",
        "In this notebook, I am going to use a pretreined BERT to compute vector-space representations of a hate speech dataset to feed two different downsteam Archtectures (CNN and MLP).\n",
        "\n",
        "Sentiment Analysis\n",
        "\n",
        "This notebook trains a sentiment analysis model to classify the [Hate Speech and Offensive Language Dataset]( https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset) tweets in three classes:\n",
        " \n",
        "* 0 - hate speech \n",
        "* 1 - offensive language \n",
        "* 2 - neither as positive or negative"
      ],
      "metadata": {
        "id": "Jh_WkIs1iJDs"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ltc_HOzjX87s"
      },
      "outputs": [],
      "source": [
        "!pip install -q tensorflow-text > /dev/null\n",
        "!pip install -q tf-models-official > /dev/null"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "from official.nlp.optimization import AdamWeightDecay, WarmUp\n",
        "import tensorflow as tf\n",
        "import tensorflow_hub as hub\n",
        "import tensorflow_text as text\n",
        "import numpy as np\n",
        "np.set_printoptions(suppress=True)\n",
        "\n",
        "\n",
        "# Carga el modelo\n",
        "with tf.keras.utils.custom_object_scope({'AdamWeightDecay': AdamWeightDecay, 'WarmUp': WarmUp}):\n",
        "    classifier_model = tf.keras.models.load_model('classifier_model.h5', \n",
        "                                                  custom_objects={'KerasLayer': hub.KerasLayer})"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "EQlIdYjKYAnn",
        "outputId": "efb1f87f-8b45-4201-ac14-2abdc74b8cfd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.data_structures has been moved to tensorflow.python.trackable.data_structures. The old module will be deleted in version 2.11.\n",
            "WARNING:tensorflow:From /usr/local/lib/python3.9/dist-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.\n",
            "Instructions for updating:\n",
            "Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089\n",
            "WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "classifier_model.predict([\"LEETSSS GOOO Get those ... outta here!!!!!!\"])"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6Ma3P-7iYEbA",
        "outputId": "2e6fbc37-6ab8-4035-c706-f8d6d3d8c7ba"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "1/1 [==============================] - 1s 1s/step\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[0.99998355, 0.00001638, 0.00000017]], dtype=float32)"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ]
    }
  ]
}