{ "cells": [ { "cell_type": "markdown", "id": "01991402-68d2-4cfb-9b3a-22f170ccf74b", "metadata": {}, "source": [ "# Build a smart assistant to help mental health counselors respond thoughtfully to patients. " ] }, { "cell_type": "markdown", "id": "f643f217-faf3-4b5a-9f14-63b898a4d7b8", "metadata": {}, "source": [ "## The Mental Health Chatbot (Multi-Turn with LLM + Classifier) works in **two steps**:\n", "\n", "1. **Understanding the situation**: When you describe a patient's issue, the system uses a machine learning model to figure out what kind of response might be most helpful—like giving advice, validating feelings, asking a follow-up question, or sharing some mental health information.\n", "\n", "2. **Generating a helpful reply**: After the system decides what type of response is appropriate, it asks a language model (Flan-T5) to write a suggestion based on that need. For example, if the model thinks the user needs validation, it will ask the LLM to generate an empathetic and supportive response.\n", "\n" ] }, { "cell_type": "markdown", "id": "0cae93e4-7e58-4497-8047-4a069bc7a6c6", "metadata": {}, "source": [ "## This following python code does:\n", "- Classifies user messages into response types (advice, validation, information, question)\n", "- Uses a language model (Flan-T5) to generate counselor-like responses\n", "- Maintains a limited conversation history\n", "- Allows exporting conversation history to a JSON file" ] }, { "cell_type": "markdown", "id": "d29686fa-8db3-4447-9970-da864a96dc64", "metadata": {}, "source": [ "### Load Required Libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "8375290a-0034-4050-af72-c76183020bec", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/Pi/miniconda3/envs/myenv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import json\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.preprocessing import LabelEncoder\n", "from xgboost import XGBClassifier\n", "from transformers import pipeline" ] }, { "cell_type": "markdown", "id": "c15554b4-d47f-4052-a18d-a51742065fc4", "metadata": {}, "source": [ "### Load and Label Dataset " ] }, { "cell_type": "code", "execution_count": 2, "id": "dbb763bb-eaa1-45ce-ad03-3d6bb300c4e4", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"dataset/Kaggle_Mental_Health_Conversations_train.csv\")\n", "df = df[['Context', 'Response']].dropna().copy()\n", "\n", "keywords_to_labels = {\n", " 'advice': ['try', 'should', 'suggest', 'recommend'],\n", " 'validation': ['understand', 'feel', 'valid', 'normal'],\n", " 'information': ['cause', 'often', 'disorder', 'symptom'],\n", " 'question': ['how', 'what', 'why', 'have you']\n", "}\n", "\n", "def auto_label_response(response):\n", " response = response.lower()\n", " for label, keywords in keywords_to_labels.items():\n", " if any(word in response for word in keywords):\n", " return label\n", " return 'information'\n", "\n", "df['response_type'] = df['Response'].apply(auto_label_response)\n", "\n" ] }, { "cell_type": "markdown", "id": "3cc76163-b688-49b1-a5ab-3f991e6a790b", "metadata": {}, "source": [ "### Train on Combined Context + Response" ] }, { "cell_type": "code", "execution_count": 3, "id": "5aff7aff-7756-4c28-9424-9a9963580883", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/Pi/miniconda3/envs/myenv/lib/python3.10/site-packages/xgboost/training.py:183: UserWarning: [22:52:39] WARNING: /Users/runner/work/xgboost/xgboost/src/learner.cc:738: \n", "Parameters: { \"use_label_encoder\" } are not used.\n", "\n", " bst.update(dtrain, iteration=i, fobj=obj)\n" ] }, { "data": { "text/html": [ "
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric='mlogloss',\n", " feature_types=None, feature_weights=None, gamma=None,\n", " grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=0.1, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=6, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " multi_strategy=None, n_estimators=100, n_jobs=None, num_class=4, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric='mlogloss',\n", " feature_types=None, feature_weights=None, gamma=None,\n", " grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=0.1, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=6, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " multi_strategy=None, n_estimators=100, n_jobs=None, num_class=4, ...)