Spaces:
Running
Running
initial import
Browse files- README.md +177 -12
- app.py +274 -0
- model/LightGBM_model_columns.joblib +3 -0
- model/RandomForest_model_columns.joblib +3 -0
- model/XGBoost_model_columns.joblib +3 -0
- model/label_encoder.joblib +3 -0
- model/obesity_LightGBM_model.joblib +3 -0
- model/obesity_RandomForest_model.joblib +3 -0
- model/obesity_XGBoost_model.joblib +3 -0
- obesity_rp/__init__.py +0 -0
- obesity_rp/config.py +27 -0
- requirements.txt +68 -0
README.md
CHANGED
@@ -1,12 +1,177 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Multi-Model Performance Analysis for Obesity Risk Classification
|
2 |
+
================================================================
|
3 |
+
|
4 |
+
1\. Project Overview
|
5 |
+
--------------------
|
6 |
+
|
7 |
+
[This project](https://github.com/ethicalabs-ai/ObesityRiskPredictor "null") provides a comprehensive framework for training, evaluating, and comparing the performance of several prominent machine learning models on the multi-class classification task of Obesity Risk Prediction.
|
8 |
+
|
9 |
+
The primary objective is to conduct a comparative analysis to determine which modeling approach yields the highest predictive accuracy and robustness for this specific dataset.
|
10 |
+
|
11 |
+
The experiment is designed to serve as a benchmark, showcasing a standardized pipeline that includes data preprocessing, hyperparameter optimization, and rigorous model evaluation.
|
12 |
+
|
13 |
+
By comparing an ensemble bagging model (Random Forest) against two powerful gradient boosting implementations (LightGBM and XGBoost), we aim to uncover insights into the most effective architecture for this type of tabular data problem.
|
14 |
+
|
15 |
+
2\. Technical Architecture & Methodologies
|
16 |
+
------------------------------------------
|
17 |
+
|
18 |
+
### 2.1. Models Evaluated
|
19 |
+
|
20 |
+
The core of this experiment involves the evaluation of three distinct, yet powerful, tree-based ensemble models:
|
21 |
+
|
22 |
+
- **Random Forest Classifier:** An ensemble method based on bagging. It operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. It is known for its robustness and ability to handle high-dimensional data.
|
23 |
+
- **LightGBM (Light Gradient Boosting Machine):** A high-performance gradient boosting framework that uses tree-based learning algorithms. It is distinguished by its use of histogram-based algorithms and leaf-wise tree growth, which results in significantly faster training speeds and lower memory usage compared to other boosting methods.
|
24 |
+
- **XGBoost (eXtreme Gradient Boosting):** An optimized and distributed gradient boosting library designed for efficiency, flexibility, and portability. It implements machine learning algorithms under the Gradient Boosting framework and provides a parallel tree boosting that solves many data science problems in a fast and accurate way.
|
25 |
+
|
26 |
+
### 2.2. Dataset
|
27 |
+
|
28 |
+
The analysis is performed on the **"Estimation of Obesity Levels Based On Eating Habits and Physical Condition"** dataset.
|
29 |
+
|
30 |
+
- **Task Type:** Multi-Class Classification
|
31 |
+
- **Features:** The dataset comprises a mix of numerical (e.g., `Age`, `Height`, `Weight`) and categorical (e.g., `Gender`, `family_history_with_overweight`, `MTRANS`) variables.
|
32 |
+
- **Instances:** 2111
|
33 |
+
- **Attributes:** 16 predictive features and 1 target class (`NObeyesdad`).
|
34 |
+
|
35 |
+
#### 2.2.1. Dataset Source and Composition
|
36 |
+
|
37 |
+
This dataset was created to estimate obesity levels in individuals from Mexico, Peru, and Colombia. It is composed of both real and synthetically generated data:
|
38 |
+
|
39 |
+
- **23%** of the data was collected directly from users via a web platform.
|
40 |
+
- **77%** of the data was generated synthetically using the SMOTE (Synthetic Minority Over-sampling Technique) filter in Weka to address class imbalance.
|
41 |
+
|
42 |
+
#### 2.2.2. Citation
|
43 |
+
|
44 |
+
Proper credit is given to the creators of this dataset.
|
45 |
+
|
46 |
+
- **Source:** [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition "null")
|
47 |
+
- **Creators:** Palechor, F. M., & de la Hoz Manotas, A. (2019).
|
48 |
+
|
49 |
+
### 2.3. Data Preprocessing Pipeline
|
50 |
+
|
51 |
+
A standardized preprocessing pipeline is applied to ensure data quality and compatibility with the machine learning models:
|
52 |
+
|
53 |
+
- **Categorical Feature Encoding:** **One-Hot Encoding** is applied to all nominal categorical features. This transforms categorical data into a numerical format without introducing an ordinal relationship, creating binary columns for each category.
|
54 |
+
- **Target Variable Encoding:** The multi-class target variable (`NObeyesdad`) is converted into numerical format using **Label Encoding**.
|
55 |
+
|
56 |
+
### 2.4. Hyperparameter Optimization
|
57 |
+
|
58 |
+
To ensure each model performs optimally, we employ a systematic hyperparameter tuning strategy:
|
59 |
+
|
60 |
+
- **Strategy:** `RandomizedSearchCV` is utilized to efficiently search a defined parameter space for each model. This approach samples a fixed number of parameter combinations from the specified distributions, offering a strong balance between computational cost and tuning effectiveness.
|
61 |
+
- **Cross-Validation:** `StratifiedKFold` cross-validation (with 5 splits) is used within the search process. This ensures that each fold is a representative sample of the overall class distribution, which is critical for maintaining robust evaluation on multi-class datasets that may have imbalanced classes.
|
62 |
+
- **Optimization Metric:** The primary scoring metric used to identify the best parameter set during the search is **Accuracy**.
|
63 |
+
|
64 |
+
### 2.5. Model Evaluation
|
65 |
+
|
66 |
+
The performance of the fine-tuned models is assessed using a standard set of classification metrics:
|
67 |
+
|
68 |
+
- **Overall Accuracy:** The primary measure of the model's ability to make correct predictions across all classes.
|
69 |
+
- **Classification Report:** A detailed report providing class-wise performance metrics, including:
|
70 |
+
- **Precision:** The ability of the classifier not to label as positive a sample that is negative.
|
71 |
+
- **Recall (Sensitivity):** The ability of the classifier to find all the positive samples.
|
72 |
+
- **F1-Score:** The weighted harmonic mean of precision and recall.
|
73 |
+
|
74 |
+
### 2.5. Evaluation Results
|
75 |
+
### 2.6. Evaluation Results
|
76 |
+
|
77 |
+
```
|
78 |
+
--- Starting Model Evaluation ---
|
79 |
+
Attempting to load dataset from 'datasets/ObesityDataSet_raw_and_data_sinthetic.csv'...
|
80 |
+
Dataset loaded successfully.
|
81 |
+
Dataset Head:
|
82 |
+
Gender Age Height Weight family_history_with_overweight FAVC FCVC NCP ... SMOKE CH2O SCC FAF TUE CALC MTRANS NObeyesdad
|
83 |
+
0 Female 21.0 1.62 64.0 yes no 2.0 3.0 ... no 2.0 no 0.0 1.0 no Public_Transportation Normal_Weight
|
84 |
+
1 Female 21.0 1.52 56.0 yes no 3.0 3.0 ... yes 3.0 yes 3.0 0.0 Sometimes Public_Transportation Normal_Weight
|
85 |
+
2 Male 23.0 1.80 77.0 yes no 2.0 3.0 ... no 2.0 no 2.0 1.0 Frequently Public_Transportation Normal_Weight
|
86 |
+
3 Male 27.0 1.80 87.0 no no 3.0 3.0 ... no 2.0 no 2.0 0.0 Frequently Walking Overweight_Level_I
|
87 |
+
4 Male 22.0 1.78 89.8 no no 2.0 1.0 ... no 2.0 no 0.0 0.0 Sometimes Public_Transportation Overweight_Level_II
|
88 |
+
|
89 |
+
[5 rows x 17 columns]
|
90 |
+
|
91 |
+
Dataset Info:
|
92 |
+
<class 'pandas.core.frame.DataFrame'>
|
93 |
+
RangeIndex: 2111 entries, 0 to 2110
|
94 |
+
Data columns (total 17 columns):
|
95 |
+
# Column Non-Null Count Dtype
|
96 |
+
--- ------ -------------- -----
|
97 |
+
0 Gender 2111 non-null object
|
98 |
+
1 Age 2111 non-null float64
|
99 |
+
2 Height 2111 non-null float64
|
100 |
+
3 Weight 2111 non-null float64
|
101 |
+
4 family_history_with_overweight 2111 non-null object
|
102 |
+
5 FAVC 2111 non-null object
|
103 |
+
6 FCVC 2111 non-null float64
|
104 |
+
7 NCP 2111 non-null float64
|
105 |
+
8 CAEC 2111 non-null object
|
106 |
+
9 SMOKE 2111 non-null object
|
107 |
+
10 CH2O 2111 non-null float64
|
108 |
+
11 SCC 2111 non-null object
|
109 |
+
12 FAF 2111 non-null float64
|
110 |
+
13 TUE 2111 non-null float64
|
111 |
+
14 CALC 2111 non-null object
|
112 |
+
15 MTRANS 2111 non-null object
|
113 |
+
16 NObeyesdad 2111 non-null object
|
114 |
+
dtypes: float64(8), object(9)
|
115 |
+
memory usage: 280.5+ KB
|
116 |
+
|
117 |
+
Preprocessing data...
|
118 |
+
Target classes mapped: {'Insufficient_Weight': np.int64(0), 'Normal_Weight': np.int64(1), 'Obesity_Type_I': np.int64(2), 'Obesity_Type_II': np.int64(3), 'Obesity_Type_III': np.int64(4), 'Overweight_Level_I': np.int64(5), 'Overweight_Level_II': np.int64(6)}
|
119 |
+
RandomForest Model, feature columns, and label encoder loaded for prediction.
|
120 |
+
|
121 |
+
Evaluating RandomForest performance...
|
122 |
+
RandomForest Accuracy: 0.9480
|
123 |
+
RandomForest Classification Report:
|
124 |
+
precision recall f1-score support
|
125 |
+
|
126 |
+
Insufficient_Weight 1.00 0.93 0.96 54
|
127 |
+
Normal_Weight 0.79 0.97 0.87 58
|
128 |
+
Obesity_Type_I 0.94 0.97 0.96 70
|
129 |
+
Obesity_Type_II 1.00 0.98 0.99 60
|
130 |
+
Obesity_Type_III 1.00 0.98 0.99 65
|
131 |
+
Overweight_Level_I 0.96 0.84 0.90 58
|
132 |
+
Overweight_Level_II 0.98 0.95 0.96 58
|
133 |
+
|
134 |
+
accuracy 0.95 423
|
135 |
+
macro avg 0.95 0.95 0.95 423
|
136 |
+
weighted avg 0.95 0.95 0.95 423
|
137 |
+
|
138 |
+
LightGBM Model, feature columns, and label encoder loaded for prediction.
|
139 |
+
|
140 |
+
Evaluating LightGBM performance...
|
141 |
+
LightGBM Accuracy: 0.9716
|
142 |
+
LightGBM Classification Report:
|
143 |
+
precision recall f1-score support
|
144 |
+
|
145 |
+
Insufficient_Weight 1.00 0.94 0.97 54
|
146 |
+
Normal_Weight 0.89 1.00 0.94 58
|
147 |
+
Obesity_Type_I 0.96 0.99 0.97 70
|
148 |
+
Obesity_Type_II 1.00 0.98 0.99 60
|
149 |
+
Obesity_Type_III 1.00 0.98 0.99 65
|
150 |
+
Overweight_Level_I 0.98 0.91 0.95 58
|
151 |
+
Overweight_Level_II 0.98 0.98 0.98 58
|
152 |
+
|
153 |
+
accuracy 0.97 423
|
154 |
+
macro avg 0.97 0.97 0.97 423
|
155 |
+
weighted avg 0.97 0.97 0.97 423
|
156 |
+
|
157 |
+
XGBoost Model, feature columns, and label encoder loaded for prediction.
|
158 |
+
|
159 |
+
Evaluating XGBoost performance...
|
160 |
+
XGBoost Accuracy: 0.9527
|
161 |
+
XGBoost Classification Report:
|
162 |
+
precision recall f1-score support
|
163 |
+
|
164 |
+
Insufficient_Weight 0.98 0.89 0.93 54
|
165 |
+
Normal_Weight 0.82 0.97 0.89 58
|
166 |
+
Obesity_Type_I 0.97 0.97 0.97 70
|
167 |
+
Obesity_Type_II 0.98 0.98 0.98 60
|
168 |
+
Obesity_Type_III 1.00 0.98 0.99 65
|
169 |
+
Overweight_Level_I 0.96 0.90 0.93 58
|
170 |
+
Overweight_Level_II 0.97 0.97 0.97 58
|
171 |
+
|
172 |
+
accuracy 0.95 423
|
173 |
+
macro avg 0.96 0.95 0.95 423
|
174 |
+
weighted avg 0.96 0.95 0.95 423
|
175 |
+
|
176 |
+
--- Model Evaluation Finished ---
|
177 |
+
```
|
app.py
ADDED
@@ -0,0 +1,274 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
|
3 |
+
|
4 |
+
import lightgbm as lgb
|
5 |
+
import xgboost as xgb
|
6 |
+
import gradio as gr
|
7 |
+
import joblib
|
8 |
+
import os
|
9 |
+
|
10 |
+
from obesity_rp import config as cfg
|
11 |
+
|
12 |
+
# Global variables to store loaded models, their columns, and the label encoder
|
13 |
+
loaded_models = {}
|
14 |
+
loaded_model_columns_map = {}
|
15 |
+
label_encoder = None
|
16 |
+
|
17 |
+
|
18 |
+
def load_model_artifacts(model_name):
|
19 |
+
"""
|
20 |
+
Loads the trained model, feature columns, and the label encoder.
|
21 |
+
"""
|
22 |
+
model_file = os.path.join(cfg.MODEL_DIR, f"obesity_{model_name}_model.joblib")
|
23 |
+
columns_file = os.path.join(cfg.MODEL_DIR, f"{model_name}_model_columns.joblib")
|
24 |
+
encoder_file = os.path.join(cfg.MODEL_DIR, "label_encoder.joblib")
|
25 |
+
|
26 |
+
if not all(os.path.exists(f) for f in [model_file, columns_file, encoder_file]):
|
27 |
+
raise FileNotFoundError(
|
28 |
+
f"Model artifacts for '{model_name}' not found. Please ensure all required files exist."
|
29 |
+
)
|
30 |
+
|
31 |
+
loaded_model = joblib.load(model_file)
|
32 |
+
loaded_model_columns = joblib.load(columns_file)
|
33 |
+
le = joblib.load(encoder_file)
|
34 |
+
print(
|
35 |
+
f"{model_name} Model, feature columns, and label encoder loaded for prediction."
|
36 |
+
)
|
37 |
+
return loaded_model, loaded_model_columns, le
|
38 |
+
|
39 |
+
|
40 |
+
def predict_obesity_risk(
|
41 |
+
model_choice,
|
42 |
+
Gender,
|
43 |
+
Age,
|
44 |
+
Height,
|
45 |
+
Weight,
|
46 |
+
family_history_with_overweight,
|
47 |
+
FAVC,
|
48 |
+
FCVC,
|
49 |
+
NCP,
|
50 |
+
CAEC,
|
51 |
+
SMOKE,
|
52 |
+
CH2O,
|
53 |
+
SCC,
|
54 |
+
FAF,
|
55 |
+
TUE,
|
56 |
+
CALC,
|
57 |
+
MTRANS,
|
58 |
+
):
|
59 |
+
"""
|
60 |
+
Predicts obesity risk based on input features and chosen model.
|
61 |
+
"""
|
62 |
+
global label_encoder
|
63 |
+
|
64 |
+
if model_choice not in loaded_models:
|
65 |
+
try:
|
66 |
+
model, columns, le = load_model_artifacts(model_choice)
|
67 |
+
loaded_models[model_choice] = model
|
68 |
+
loaded_model_columns_map[model_choice] = columns
|
69 |
+
if label_encoder is None:
|
70 |
+
label_encoder = le
|
71 |
+
except FileNotFoundError as e:
|
72 |
+
return f"Error: {e}. Model '{model_choice}' not found. Please train the model first."
|
73 |
+
else:
|
74 |
+
model = loaded_models[model_choice]
|
75 |
+
columns = loaded_model_columns_map[model_choice]
|
76 |
+
le = label_encoder
|
77 |
+
|
78 |
+
# Create a dictionary to hold the input data
|
79 |
+
input_data_dict = {
|
80 |
+
"Age": Age,
|
81 |
+
"Height": Height,
|
82 |
+
"Weight": Weight,
|
83 |
+
"FCVC": FCVC,
|
84 |
+
"NCP": NCP,
|
85 |
+
"CH2O": CH2O,
|
86 |
+
"FAF": FAF,
|
87 |
+
"TUE": TUE,
|
88 |
+
}
|
89 |
+
|
90 |
+
input_df = pd.DataFrame(0, index=[0], columns=columns)
|
91 |
+
|
92 |
+
for col, value in input_data_dict.items():
|
93 |
+
if col in input_df.columns:
|
94 |
+
input_df.loc[0, col] = value
|
95 |
+
|
96 |
+
# Handle one-hot encoded categorical features
|
97 |
+
categorical_inputs = {
|
98 |
+
"Gender": Gender,
|
99 |
+
"family_history_with_overweight": family_history_with_overweight,
|
100 |
+
"FAVC": FAVC,
|
101 |
+
"CAEC": CAEC,
|
102 |
+
"SMOKE": SMOKE,
|
103 |
+
"SCC": SCC,
|
104 |
+
"CALC": CALC,
|
105 |
+
"MTRANS": MTRANS,
|
106 |
+
}
|
107 |
+
|
108 |
+
for col_prefix, value in categorical_inputs.items():
|
109 |
+
column_name = f"{col_prefix}_{value}"
|
110 |
+
if column_name in input_df.columns:
|
111 |
+
input_df.loc[0, column_name] = 1
|
112 |
+
|
113 |
+
input_df = input_df[columns]
|
114 |
+
|
115 |
+
prediction_proba = model.predict_proba(input_df)[0]
|
116 |
+
prediction_encoded = model.predict(input_df)[0]
|
117 |
+
prediction_label = le.inverse_transform([prediction_encoded])[0]
|
118 |
+
|
119 |
+
results = f"Using {model_choice} Model:\nPrediction: {prediction_label}\n\n--- Prediction Probabilities ---\n"
|
120 |
+
for i, class_name in enumerate(le.classes_):
|
121 |
+
prob = prediction_proba[i] * 100
|
122 |
+
results += f"{class_name}: {prob:.2f}%\n"
|
123 |
+
|
124 |
+
return results
|
125 |
+
|
126 |
+
|
127 |
+
def launch_gradio_app(share=False):
|
128 |
+
"""
|
129 |
+
Launches the Gradio web application for obesity risk prediction.
|
130 |
+
"""
|
131 |
+
print("\n--- Starting Gradio App ---")
|
132 |
+
|
133 |
+
# Define Gradio input components
|
134 |
+
model_choice_input = gr.Dropdown(
|
135 |
+
choices=cfg.MODEL_CHOICES, label="Select Model", value=cfg.RANDOM_FOREST
|
136 |
+
)
|
137 |
+
gender_input = gr.Dropdown(choices=["Female", "Male"], label="Gender")
|
138 |
+
age_input = gr.Slider(minimum=1, maximum=100, step=1, label="Age")
|
139 |
+
height_input = gr.Slider(minimum=1.0, maximum=2.2, step=0.01, label="Height (m)")
|
140 |
+
weight_input = gr.Slider(minimum=30.0, maximum=200.0, step=0.1, label="Weight (kg)")
|
141 |
+
family_history_input = gr.Radio(
|
142 |
+
choices=["yes", "no"], label="Family History with Overweight"
|
143 |
+
)
|
144 |
+
favc_input = gr.Radio(
|
145 |
+
choices=["yes", "no"], label="Frequent consumption of high caloric food (FAVC)"
|
146 |
+
)
|
147 |
+
fcvc_input = gr.Slider(
|
148 |
+
minimum=1,
|
149 |
+
maximum=3,
|
150 |
+
step=1,
|
151 |
+
label="Frequency of consumption of vegetables (FCVC)",
|
152 |
+
)
|
153 |
+
ncp_input = gr.Slider(
|
154 |
+
minimum=1, maximum=4, step=1, label="Number of main meals (NCP)"
|
155 |
+
)
|
156 |
+
caec_input = gr.Dropdown(
|
157 |
+
choices=["no", "Sometimes", "Frequently", "Always"],
|
158 |
+
label="Consumption of food between meals (CAEC)",
|
159 |
+
)
|
160 |
+
smoke_input = gr.Radio(choices=["yes", "no"], label="SMOKE")
|
161 |
+
ch2o_input = gr.Slider(
|
162 |
+
minimum=1, maximum=3, step=1, label="Consumption of water daily (CH2O)"
|
163 |
+
)
|
164 |
+
scc_input = gr.Radio(
|
165 |
+
choices=["yes", "no"], label="Calories consumption monitoring (SCC)"
|
166 |
+
)
|
167 |
+
faf_input = gr.Slider(
|
168 |
+
minimum=0, maximum=3, step=1, label="Physical activity frequency (FAF)"
|
169 |
+
)
|
170 |
+
tue_input = gr.Slider(
|
171 |
+
minimum=0, maximum=2, step=1, label="Time using technology devices (TUE)"
|
172 |
+
)
|
173 |
+
calc_input = gr.Dropdown(
|
174 |
+
choices=["no", "Sometimes", "Frequently", "Always"],
|
175 |
+
label="Consumption of alcohol (CALC)",
|
176 |
+
)
|
177 |
+
mtrans_input = gr.Dropdown(
|
178 |
+
choices=["Automobile", "Motorbike", "Bike", "Public_Transportation", "Walking"],
|
179 |
+
label="Transportation used (MTRANS)",
|
180 |
+
)
|
181 |
+
|
182 |
+
output_text = gr.Textbox(label="Obesity Risk Prediction Result", lines=10)
|
183 |
+
|
184 |
+
iface = gr.Interface(
|
185 |
+
fn=predict_obesity_risk,
|
186 |
+
inputs=[
|
187 |
+
model_choice_input,
|
188 |
+
gender_input,
|
189 |
+
age_input,
|
190 |
+
height_input,
|
191 |
+
weight_input,
|
192 |
+
family_history_input,
|
193 |
+
favc_input,
|
194 |
+
fcvc_input,
|
195 |
+
ncp_input,
|
196 |
+
caec_input,
|
197 |
+
smoke_input,
|
198 |
+
ch2o_input,
|
199 |
+
scc_input,
|
200 |
+
faf_input,
|
201 |
+
tue_input,
|
202 |
+
calc_input,
|
203 |
+
mtrans_input,
|
204 |
+
],
|
205 |
+
outputs=output_text,
|
206 |
+
title="Obesity Risk Prediction (Multi-Model)",
|
207 |
+
description="Select a machine learning model and enter patient details to predict the obesity risk category.",
|
208 |
+
examples=[
|
209 |
+
[
|
210 |
+
cfg.RANDOM_FOREST,
|
211 |
+
"Male",
|
212 |
+
25,
|
213 |
+
1.8,
|
214 |
+
85,
|
215 |
+
"yes",
|
216 |
+
"yes",
|
217 |
+
2,
|
218 |
+
3,
|
219 |
+
"Sometimes",
|
220 |
+
"no",
|
221 |
+
2,
|
222 |
+
"no",
|
223 |
+
1,
|
224 |
+
1,
|
225 |
+
"Frequently",
|
226 |
+
"Public_Transportation",
|
227 |
+
],
|
228 |
+
[
|
229 |
+
cfg.LIGHTGBM,
|
230 |
+
"Female",
|
231 |
+
30,
|
232 |
+
1.65,
|
233 |
+
70,
|
234 |
+
"yes",
|
235 |
+
"yes",
|
236 |
+
3,
|
237 |
+
3,
|
238 |
+
"Frequently",
|
239 |
+
"no",
|
240 |
+
3,
|
241 |
+
"yes",
|
242 |
+
2,
|
243 |
+
0,
|
244 |
+
"Sometimes",
|
245 |
+
"Automobile",
|
246 |
+
],
|
247 |
+
[
|
248 |
+
cfg.XGBOOST,
|
249 |
+
"Female",
|
250 |
+
21,
|
251 |
+
1.52,
|
252 |
+
56,
|
253 |
+
"yes",
|
254 |
+
"no",
|
255 |
+
3,
|
256 |
+
3,
|
257 |
+
"Sometimes",
|
258 |
+
"yes",
|
259 |
+
3,
|
260 |
+
"yes",
|
261 |
+
3,
|
262 |
+
0,
|
263 |
+
"Sometimes",
|
264 |
+
"Public_Transportation",
|
265 |
+
],
|
266 |
+
],
|
267 |
+
)
|
268 |
+
|
269 |
+
iface.launch(share=share)
|
270 |
+
print("--- Gradio App Launched ---")
|
271 |
+
|
272 |
+
|
273 |
+
if __name__ == "__main__":
|
274 |
+
launch_gradio_app(share=False)
|
model/LightGBM_model_columns.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
|
3 |
+
size 327
|
model/RandomForest_model_columns.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
|
3 |
+
size 327
|
model/XGBoost_model_columns.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
|
3 |
+
size 327
|
model/label_encoder.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:43bd445b421e9956b488fc242ab67cbe5c6adfde6447803285b0c6ae47d21587
|
3 |
+
size 608
|
model/obesity_LightGBM_model.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:da6364966e3d5f5d59d3d3db27046be8eff8ea1c8a5637fa8e111b0420f9e457
|
3 |
+
size 2418732
|
model/obesity_RandomForest_model.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bd825a6560f7873a60f94bfdd377b37648e8daebd116e9b72b00b18d1d3c2b29
|
3 |
+
size 20145505
|
model/obesity_XGBoost_model.joblib
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:366d23a5061befeb45609742c6bfbe6c05d04907168ae150147df824457a4c68
|
3 |
+
size 3021443
|
obesity_rp/__init__.py
ADDED
File without changes
|
obesity_rp/config.py
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# File and Directory Paths
|
2 |
+
DATASET_FILE = "datasets/ObesityDataSet_raw_and_data_sinthetic.csv"
|
3 |
+
MODEL_DIR = "model/"
|
4 |
+
|
5 |
+
# Target Variable
|
6 |
+
TARGET_COLUMN = "NObeyesdad"
|
7 |
+
|
8 |
+
# Feature Columns
|
9 |
+
CATEGORICAL_FEATURES = [
|
10 |
+
"Gender",
|
11 |
+
"family_history_with_overweight",
|
12 |
+
"FAVC",
|
13 |
+
"CAEC",
|
14 |
+
"SMOKE",
|
15 |
+
"SCC",
|
16 |
+
"CALC",
|
17 |
+
"MTRANS",
|
18 |
+
]
|
19 |
+
|
20 |
+
# Numerical Features
|
21 |
+
NUMERICAL_FEATURES = ["Age", "Height", "Weight", "FCVC", "NCP", "CH2O", "FAF", "TUE"]
|
22 |
+
|
23 |
+
# Model Identifiers
|
24 |
+
RANDOM_FOREST = "RandomForest"
|
25 |
+
LIGHTGBM = "LightGBM"
|
26 |
+
XGBOOST = "XGBoost"
|
27 |
+
MODEL_CHOICES = [RANDOM_FOREST, LIGHTGBM, XGBOOST]
|
requirements.txt
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
aiofiles==24.1.0
|
2 |
+
annotated-types==0.7.0
|
3 |
+
anyio==4.10.0
|
4 |
+
brotli==1.1.0
|
5 |
+
certifi==2025.8.3
|
6 |
+
charset-normalizer==3.4.2
|
7 |
+
click==8.2.1
|
8 |
+
contourpy==1.3.2
|
9 |
+
cycler==0.12.1
|
10 |
+
exceptiongroup==1.3.0
|
11 |
+
fastapi==0.116.1
|
12 |
+
ffmpy==0.6.1
|
13 |
+
filelock==3.18.0
|
14 |
+
fonttools==4.59.0
|
15 |
+
fsspec==2025.7.0
|
16 |
+
gradio==5.41.0
|
17 |
+
gradio-client==1.11.0
|
18 |
+
groovy==0.1.2
|
19 |
+
h11==0.16.0
|
20 |
+
hf-xet==1.1.7
|
21 |
+
httpcore==1.0.9
|
22 |
+
httpx==0.28.1
|
23 |
+
huggingface-hub==0.34.3
|
24 |
+
idna==3.10
|
25 |
+
jinja2==3.1.6
|
26 |
+
joblib==1.5.1
|
27 |
+
kiwisolver==1.4.8
|
28 |
+
lightgbm==4.6.0
|
29 |
+
markdown-it-py==3.0.0
|
30 |
+
markupsafe==3.0.2
|
31 |
+
matplotlib==3.10.5
|
32 |
+
mdurl==0.1.2
|
33 |
+
numpy==2.2.6
|
34 |
+
orjson==3.11.1
|
35 |
+
packaging==25.0
|
36 |
+
pandas==2.3.1
|
37 |
+
pillow==11.3.0
|
38 |
+
pydantic==2.11.7
|
39 |
+
pydantic-core==2.33.2
|
40 |
+
pydub==0.25.1
|
41 |
+
pygments==2.19.2
|
42 |
+
pyparsing==3.2.3
|
43 |
+
python-dateutil==2.9.0.post0
|
44 |
+
python-multipart==0.0.20
|
45 |
+
pytz==2025.2
|
46 |
+
pyyaml==6.0.2
|
47 |
+
requests==2.32.4
|
48 |
+
rich==14.1.0
|
49 |
+
ruff==0.12.7
|
50 |
+
safehttpx==0.1.6
|
51 |
+
scikit-learn==1.7.1
|
52 |
+
scipy==1.15.3
|
53 |
+
semantic-version==2.10.0
|
54 |
+
shellingham==1.5.4
|
55 |
+
six==1.17.0
|
56 |
+
sniffio==1.3.1
|
57 |
+
starlette==0.47.2
|
58 |
+
threadpoolctl==3.6.0
|
59 |
+
tomlkit==0.13.3
|
60 |
+
tqdm==4.67.1
|
61 |
+
typer==0.16.0
|
62 |
+
typing-extensions==4.14.1
|
63 |
+
typing-inspection==0.4.1
|
64 |
+
tzdata==2025.2
|
65 |
+
urllib3==2.5.0
|
66 |
+
uvicorn==0.35.0
|
67 |
+
websockets==15.0.1
|
68 |
+
xgboost==3.0.3
|