{ "cells": [ { "cell_type": "markdown", "id": "01991402-68d2-4cfb-9b3a-22f170ccf74b", "metadata": {}, "source": [ "# Build a smart assistant to help mental health counselors respond thoughtfully to patients. " ] }, { "cell_type": "markdown", "id": "f643f217-faf3-4b5a-9f14-63b898a4d7b8", "metadata": {}, "source": [ "## The Mental Health Chatbot (Multi-Turn with LLM + Classifier) works in **two steps**:\n", "\n", "1. **Understanding the situation**: When you describe a patient's issue, the system uses a machine learning model to figure out what kind of response might be most helpful—like giving advice, validating feelings, asking a follow-up question, or sharing some mental health information.\n", "\n", "2. **Generating a helpful reply**: After the system decides what type of response is appropriate, it asks a language model (Flan-T5) to write a suggestion based on that need. For example, if the model thinks the user needs validation, it will ask the LLM to generate an empathetic and supportive response.\n", "\n" ] }, { "cell_type": "markdown", "id": "0cae93e4-7e58-4497-8047-4a069bc7a6c6", "metadata": {}, "source": [ "## This following python code does:\n", "- Classifies user messages into response types (advice, validation, information, question)\n", "- Uses a language model (Flan-T5) to generate counselor-like responses\n", "- Maintains a limited conversation history\n", "- Allows exporting conversation history to a JSON file" ] }, { "cell_type": "markdown", "id": "d29686fa-8db3-4447-9970-da864a96dc64", "metadata": {}, "source": [ "### Load Required Libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "8375290a-0034-4050-af72-c76183020bec", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/Pi/miniconda3/envs/myenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import json\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.preprocessing import LabelEncoder\n", "from xgboost import XGBClassifier\n", "from transformers import pipeline" ] }, { "cell_type": "markdown", "id": "c15554b4-d47f-4052-a18d-a51742065fc4", "metadata": {}, "source": [ "### Load and Label Dataset " ] }, { "cell_type": "code", "execution_count": 2, "id": "dbb763bb-eaa1-45ce-ad03-3d6bb300c4e4", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"dataset/Kaggle_Mental_Health_Conversations_train.csv\")\n", "df = df[['Context', 'Response']].dropna().copy()\n", "\n", "keywords_to_labels = {\n", " 'advice': ['try', 'should', 'suggest', 'recommend'],\n", " 'validation': ['understand', 'feel', 'valid', 'normal'],\n", " 'information': ['cause', 'often', 'disorder', 'symptom'],\n", " 'question': ['how', 'what', 'why', 'have you']\n", "}\n", "\n", "def auto_label_response(response):\n", " response = response.lower()\n", " for label, keywords in keywords_to_labels.items():\n", " if any(word in response for word in keywords):\n", " return label\n", " return 'information'\n", "\n", "df['response_type'] = df['Response'].apply(auto_label_response)\n", "\n" ] }, { "cell_type": "markdown", "id": "3cc76163-b688-49b1-a5ab-3f991e6a790b", "metadata": {}, "source": [ "### Train on Combined Context + Response" ] }, { "cell_type": "code", "execution_count": 3, "id": "5aff7aff-7756-4c28-9424-9a9963580883", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/Pi/miniconda3/envs/myenv/lib/python3.10/site-packages/xgboost/training.py:183: UserWarning: [22:52:39] WARNING: /Users/runner/work/xgboost/xgboost/src/learner.cc:738: \n", "Parameters: { \"use_label_encoder\" } are not used.\n", "\n", " bst.update(dtrain, iteration=i, fobj=obj)\n" ] }, { "data": { "text/html": [ "
XGBClassifier(base_score=None, booster=None, callbacks=None,\n",
       "              colsample_bylevel=None, colsample_bynode=None,\n",
       "              colsample_bytree=None, device=None, early_stopping_rounds=None,\n",
       "              enable_categorical=False, eval_metric='mlogloss',\n",
       "              feature_types=None, feature_weights=None, gamma=None,\n",
       "              grow_policy=None, importance_type=None,\n",
       "              interaction_constraints=None, learning_rate=0.1, max_bin=None,\n",
       "              max_cat_threshold=None, max_cat_to_onehot=None,\n",
       "              max_delta_step=None, max_depth=6, max_leaves=None,\n",
       "              min_child_weight=None, missing=nan, monotone_constraints=None,\n",
       "              multi_strategy=None, n_estimators=100, n_jobs=None, num_class=4, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric='mlogloss',\n", " feature_types=None, feature_weights=None, gamma=None,\n", " grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=0.1, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=6, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " multi_strategy=None, n_estimators=100, n_jobs=None, num_class=4, ...)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['combined_text'] = df['Context'] + \" \" + df['Response']\n", "\n", "le = LabelEncoder()\n", "y = le.fit_transform(df['response_type'])\n", "\n", "vectorizer = TfidfVectorizer(max_features=2000, ngram_range=(1, 2))\n", "X = vectorizer.fit_transform(df['combined_text'])\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.2, stratify=y, random_state=42\n", ")\n", "\n", "xgb_model = XGBClassifier(\n", " objective='multi:softmax',\n", " num_class=len(le.classes_),\n", " eval_metric='mlogloss',\n", " use_label_encoder=False,\n", " max_depth=6,\n", " learning_rate=0.1,\n", " n_estimators=100\n", ")\n", "xgb_model.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "c7b4e142-8479-4da0-8f6a-70e488f90349", "metadata": {}, "source": [ "### Load LLM (Flan-T5)" ] }, { "cell_type": "code", "execution_count": 4, "id": "ea35abb0-f2df-4331-b41a-965a9e42b4c5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading Flan-T5 model... (this may take a few seconds)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Device set to use mps:0\n" ] } ], "source": [ "print(\"Loading Flan-T5 model... (this may take a few seconds)\")\n", "llm = pipeline(\"text2text-generation\", model=\"google/flan-t5-base\")" ] }, { "cell_type": "markdown", "id": "c1ccf043-7431-4767-9ac8-73c59fe46ccf", "metadata": {}, "source": [ "### Prediction + Prompt Functions" ] }, { "cell_type": "code", "execution_count": 5, "id": "982161f8-824c-45c2-b968-e6e7ad5e0874", "metadata": {}, "outputs": [], "source": [ "def predict_response_type(user_input):\n", " combined = user_input + \" placeholder_response\"\n", " vec = vectorizer.transform([combined])\n", " prediction = xgb_model.predict(vec)[0]\n", " predicted_class = le.inverse_transform([prediction])[0]\n", " confidence = np.max(xgb_model.predict_proba(vec))\n", " return predicted_class, confidence\n", "\n", "def prompt_templates(user_input, response_type):\n", " templates = {\n", " \"advice\": f\"A student said: \\\"{user_input}\\\". What practical advice should a mental health counselor offer?\",\n", " \"validation\": f\"A student said: \\\"{user_input}\\\". Respond with an emotionally supportive message that shows empathy and validates their feelings.\",\n", " \"information\": f\"A student said: \\\"{user_input}\\\". Explain what might be happening emotionally from a counselor's perspective.\",\n", " \"question\": f\"A student said: \\\"{user_input}\\\". What are 1-2 thoughtful follow-up questions a counselor might ask?\"\n", " }\n", " return templates.get(response_type, templates[\"information\"])\n", "\n", "def generate_llm_response(user_input, response_type):\n", " prompt = prompt_templates(user_input, response_type)\n", " result = llm(prompt, max_length=150, do_sample=True, temperature=0.7, top_p=0.9)\n", " return result[0][\"generated_text\"].strip()" ] }, { "cell_type": "markdown", "id": "8599f696-9a22-4f03-9021-596b46febf26", "metadata": {}, "source": [ "### Conversation Memory + Exporting" ] }, { "cell_type": "code", "execution_count": 6, "id": "f95ba731-e303-42b2-a46d-143f5aaeb914", "metadata": {}, "outputs": [], "source": [ "MAX_MEMORY_TURNS = 6\n", "history = []\n", "\n", "def trim_memory(history, max_turns=MAX_MEMORY_TURNS):\n", " return history[-max_turns:]\n", "\n", "def save_conversation(history, filename=\"chat_history.json\"):\n", " with open(filename, \"w\") as f:\n", " json.dump(history, f, indent=2)\n", " print(f\"\\nConversation saved to {filename}\")" ] }, { "cell_type": "markdown", "id": "74e2aba2-20fd-48b7-b770-8206aa9fb396", "metadata": {}, "source": [ "### Intro + Chat" ] }, { "cell_type": "code", "execution_count": 7, "id": "f3e4b5a8-f2b0-4ed4-b6d3-23541f615c0a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Multi-Turn Mental Health Chatbot ---\n", "This assistant simulates a counselor's conversation using AI.\n", "- Type something your patient/student might say\n", "- Type 'save' to export the conversation\n", "- Type 'exit' to quit\n", "\n", "Example:\n", "User: I feel like I’ll mess up my big presentation tomorrow.\n", "Counselor: It’s completely normal to feel nervous before a big event...\n", "\n" ] } ], "source": [ "def show_intro():\n", " print(\"\\n--- Multi-Turn Mental Health Chatbot ---\")\n", " print(\"This assistant simulates a counselor's conversation using AI.\")\n", " print(\"- Type something your patient/student might say\")\n", " print(\"- Type 'save' to export the conversation\")\n", " print(\"- Type 'exit' to quit\\n\")\n", "\n", " print(\"Example:\")\n", " print(\"User: I feel like I’ll mess up my big presentation tomorrow.\")\n", " print(\"Counselor: It’s completely normal to feel nervous before a big event...\\n\")\n", "\n", "show_intro()" ] }, { "cell_type": "markdown", "id": "1b818787-dab2-4d0c-a384-55cfc6072b9d", "metadata": {}, "source": [ "### Chat Loop" ] }, { "cell_type": "code", "execution_count": 8, "id": "748b8c59-fbc0-4ca3-814f-5f299128fddd", "metadata": {}, "outputs": [ { "name": "stdin", "output_type": "stream", "text": [ "User: i'm nervous\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "(Predicted: information, Confidence: 85.5%)\n", "Counselor: The student might be feeling anxious or uncertain about the situation.\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ "User: exit\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Goodbye\n" ] } ], "source": [ "while True:\n", " user_input = input(\"User: \").strip()\n", "\n", " if user_input.lower() == \"exit\":\n", " print(\"Goodbye\")\n", " break\n", " elif user_input.lower() == \"save\":\n", " save_conversation(history)\n", " continue\n", "\n", " predicted_type, confidence = predict_response_type(user_input)\n", " print(f\"(Predicted: {predicted_type}, Confidence: {confidence:.1%})\")\n", "\n", " llm_reply = generate_llm_response(user_input, predicted_type)\n", "\n", " history.append({\"role\": \"user\", \"content\": user_input})\n", " history.append({\"role\": \"assistant\", \"content\": llm_reply})\n", " history = trim_memory(history)\n", "\n", " print(\"Counselor:\", llm_reply)" ] }, { "cell_type": "code", "execution_count": null, "id": "72251dc9-6090-4448-8fe9-04ad82079520", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python (myenv)", "language": "python", "name": "myenv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 5 }