Spaces:
Runtime error
Runtime error
File size: 5,110 Bytes
332ee8a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Implementation of text classification with BERT\n",
"\n",
"\n",
"Still Working on it.\n",
"\n",
"This notebook is based in this TensorFlow tutorial: [Classify text with BERT](https://www.tensorflow.org/tutorials/text/classify_text_wibert)\n",
"\n",
"BERT [(article link)](https://arxiv.org/abs/1810.04805) and other Transformer encoder architectures have been wildly successful on a variety of tasks in NLP (natural language processing). They compute vector-space representations of natural language that are suitable for use in deep learning models.\n",
"\n",
":\n",
"\n",
"Source: http://www.d2l.ai/chapter_natural-language-processing-pretraining/index.html\n",
"\n",
"BERT models are usually pre-trained on a large corpus of text, then fine-tuned for specific tasks.\n",
"\n",
"In this notebook, I am going to use a pretreined BERT to compute vector-space representations of a hate speech dataset to feed two different downsteam Archtectures (CNN and MLP).\n",
"\n",
"Sentiment Analysis\n",
"\n",
"This notebook trains a sentiment analysis model to classify the [Hate Speech and Offensive Language Dataset]( https://www.kaggle.com/mrmorj/hate-speech-and-offensive-language-dataset) tweets in three classes:\n",
" \n",
"* 0 - hate speech \n",
"* 1 - offensive language \n",
"* 2 - neither as positive or negative"
],
"metadata": {
"id": "Jh_WkIs1iJDs"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ltc_HOzjX87s"
},
"outputs": [],
"source": [
"!pip install -q tensorflow-text > /dev/null\n",
"!pip install -q tf-models-official > /dev/null"
]
},
{
"cell_type": "code",
"source": [
"from official.nlp.optimization import AdamWeightDecay, WarmUp\n",
"import tensorflow as tf\n",
"import tensorflow_hub as hub\n",
"import tensorflow_text as text\n",
"import numpy as np\n",
"np.set_printoptions(suppress=True)\n",
"\n",
"\n",
"# Carga el modelo\n",
"with tf.keras.utils.custom_object_scope({'AdamWeightDecay': AdamWeightDecay, 'WarmUp': WarmUp}):\n",
" classifier_model = tf.keras.models.load_model('classifier_model.h5', \n",
" custom_objects={'KerasLayer': hub.KerasLayer})"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EQlIdYjKYAnn",
"outputId": "efb1f87f-8b45-4201-ac14-2abdc74b8cfd"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.data_structures has been moved to tensorflow.python.trackable.data_structures. The old module will be deleted in version 2.11.\n",
"WARNING:tensorflow:From /usr/local/lib/python3.9/dist-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.\n",
"Instructions for updating:\n",
"Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089\n",
"WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"classifier_model.predict([\"LEETSSS GOOO Get those ... outta here!!!!!!\"])"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6Ma3P-7iYEbA",
"outputId": "2e6fbc37-6ab8-4035-c706-f8d6d3d8c7ba"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"1/1 [==============================] - 1s 1s/step\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[0.99998355, 0.00001638, 0.00000017]], dtype=float32)"
]
},
"metadata": {},
"execution_count": 4
}
]
}
]
} |