mrs83 commited on
Commit
697fe11
·
1 Parent(s): 60dad93

initial import

Browse files
README.md CHANGED
@@ -1,12 +1,177 @@
1
- ---
2
- title: ObesityRiskPredictor
3
- emoji: 👁
4
- colorFrom: gray
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.43.1
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Multi-Model Performance Analysis for Obesity Risk Classification
2
+ ================================================================
3
+
4
+ 1\. Project Overview
5
+ --------------------
6
+
7
+ [This project](https://github.com/ethicalabs-ai/ObesityRiskPredictor "null") provides a comprehensive framework for training, evaluating, and comparing the performance of several prominent machine learning models on the multi-class classification task of Obesity Risk Prediction.
8
+
9
+ The primary objective is to conduct a comparative analysis to determine which modeling approach yields the highest predictive accuracy and robustness for this specific dataset.
10
+
11
+ The experiment is designed to serve as a benchmark, showcasing a standardized pipeline that includes data preprocessing, hyperparameter optimization, and rigorous model evaluation.
12
+
13
+ By comparing an ensemble bagging model (Random Forest) against two powerful gradient boosting implementations (LightGBM and XGBoost), we aim to uncover insights into the most effective architecture for this type of tabular data problem.
14
+
15
+ 2\. Technical Architecture & Methodologies
16
+ ------------------------------------------
17
+
18
+ ### 2.1. Models Evaluated
19
+
20
+ The core of this experiment involves the evaluation of three distinct, yet powerful, tree-based ensemble models:
21
+
22
+ - **Random Forest Classifier:** An ensemble method based on bagging. It operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees. It is known for its robustness and ability to handle high-dimensional data.
23
+ - **LightGBM (Light Gradient Boosting Machine):** A high-performance gradient boosting framework that uses tree-based learning algorithms. It is distinguished by its use of histogram-based algorithms and leaf-wise tree growth, which results in significantly faster training speeds and lower memory usage compared to other boosting methods.
24
+ - **XGBoost (eXtreme Gradient Boosting):** An optimized and distributed gradient boosting library designed for efficiency, flexibility, and portability. It implements machine learning algorithms under the Gradient Boosting framework and provides a parallel tree boosting that solves many data science problems in a fast and accurate way.
25
+
26
+ ### 2.2. Dataset
27
+
28
+ The analysis is performed on the **"Estimation of Obesity Levels Based On Eating Habits and Physical Condition"** dataset.
29
+
30
+ - **Task Type:** Multi-Class Classification
31
+ - **Features:** The dataset comprises a mix of numerical (e.g., `Age`, `Height`, `Weight`) and categorical (e.g., `Gender`, `family_history_with_overweight`, `MTRANS`) variables.
32
+ - **Instances:** 2111
33
+ - **Attributes:** 16 predictive features and 1 target class (`NObeyesdad`).
34
+
35
+ #### 2.2.1. Dataset Source and Composition
36
+
37
+ This dataset was created to estimate obesity levels in individuals from Mexico, Peru, and Colombia. It is composed of both real and synthetically generated data:
38
+
39
+ - **23%** of the data was collected directly from users via a web platform.
40
+ - **77%** of the data was generated synthetically using the SMOTE (Synthetic Minority Over-sampling Technique) filter in Weka to address class imbalance.
41
+
42
+ #### 2.2.2. Citation
43
+
44
+ Proper credit is given to the creators of this dataset.
45
+
46
+ - **Source:** [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/544/estimation+of+obesity+levels+based+on+eating+habits+and+physical+condition "null")
47
+ - **Creators:** Palechor, F. M., & de la Hoz Manotas, A. (2019).
48
+
49
+ ### 2.3. Data Preprocessing Pipeline
50
+
51
+ A standardized preprocessing pipeline is applied to ensure data quality and compatibility with the machine learning models:
52
+
53
+ - **Categorical Feature Encoding:** **One-Hot Encoding** is applied to all nominal categorical features. This transforms categorical data into a numerical format without introducing an ordinal relationship, creating binary columns for each category.
54
+ - **Target Variable Encoding:** The multi-class target variable (`NObeyesdad`) is converted into numerical format using **Label Encoding**.
55
+
56
+ ### 2.4. Hyperparameter Optimization
57
+
58
+ To ensure each model performs optimally, we employ a systematic hyperparameter tuning strategy:
59
+
60
+ - **Strategy:** `RandomizedSearchCV` is utilized to efficiently search a defined parameter space for each model. This approach samples a fixed number of parameter combinations from the specified distributions, offering a strong balance between computational cost and tuning effectiveness.
61
+ - **Cross-Validation:** `StratifiedKFold` cross-validation (with 5 splits) is used within the search process. This ensures that each fold is a representative sample of the overall class distribution, which is critical for maintaining robust evaluation on multi-class datasets that may have imbalanced classes.
62
+ - **Optimization Metric:** The primary scoring metric used to identify the best parameter set during the search is **Accuracy**.
63
+
64
+ ### 2.5. Model Evaluation
65
+
66
+ The performance of the fine-tuned models is assessed using a standard set of classification metrics:
67
+
68
+ - **Overall Accuracy:** The primary measure of the model's ability to make correct predictions across all classes.
69
+ - **Classification Report:** A detailed report providing class-wise performance metrics, including:
70
+ - **Precision:** The ability of the classifier not to label as positive a sample that is negative.
71
+ - **Recall (Sensitivity):** The ability of the classifier to find all the positive samples.
72
+ - **F1-Score:** The weighted harmonic mean of precision and recall.
73
+
74
+ ### 2.5. Evaluation Results
75
+ ### 2.6. Evaluation Results
76
+
77
+ ```
78
+ --- Starting Model Evaluation ---
79
+ Attempting to load dataset from 'datasets/ObesityDataSet_raw_and_data_sinthetic.csv'...
80
+ Dataset loaded successfully.
81
+ Dataset Head:
82
+ Gender Age Height Weight family_history_with_overweight FAVC FCVC NCP ... SMOKE CH2O SCC FAF TUE CALC MTRANS NObeyesdad
83
+ 0 Female 21.0 1.62 64.0 yes no 2.0 3.0 ... no 2.0 no 0.0 1.0 no Public_Transportation Normal_Weight
84
+ 1 Female 21.0 1.52 56.0 yes no 3.0 3.0 ... yes 3.0 yes 3.0 0.0 Sometimes Public_Transportation Normal_Weight
85
+ 2 Male 23.0 1.80 77.0 yes no 2.0 3.0 ... no 2.0 no 2.0 1.0 Frequently Public_Transportation Normal_Weight
86
+ 3 Male 27.0 1.80 87.0 no no 3.0 3.0 ... no 2.0 no 2.0 0.0 Frequently Walking Overweight_Level_I
87
+ 4 Male 22.0 1.78 89.8 no no 2.0 1.0 ... no 2.0 no 0.0 0.0 Sometimes Public_Transportation Overweight_Level_II
88
+
89
+ [5 rows x 17 columns]
90
+
91
+ Dataset Info:
92
+ <class 'pandas.core.frame.DataFrame'>
93
+ RangeIndex: 2111 entries, 0 to 2110
94
+ Data columns (total 17 columns):
95
+ # Column Non-Null Count Dtype
96
+ --- ------ -------------- -----
97
+ 0 Gender 2111 non-null object
98
+ 1 Age 2111 non-null float64
99
+ 2 Height 2111 non-null float64
100
+ 3 Weight 2111 non-null float64
101
+ 4 family_history_with_overweight 2111 non-null object
102
+ 5 FAVC 2111 non-null object
103
+ 6 FCVC 2111 non-null float64
104
+ 7 NCP 2111 non-null float64
105
+ 8 CAEC 2111 non-null object
106
+ 9 SMOKE 2111 non-null object
107
+ 10 CH2O 2111 non-null float64
108
+ 11 SCC 2111 non-null object
109
+ 12 FAF 2111 non-null float64
110
+ 13 TUE 2111 non-null float64
111
+ 14 CALC 2111 non-null object
112
+ 15 MTRANS 2111 non-null object
113
+ 16 NObeyesdad 2111 non-null object
114
+ dtypes: float64(8), object(9)
115
+ memory usage: 280.5+ KB
116
+
117
+ Preprocessing data...
118
+ Target classes mapped: {'Insufficient_Weight': np.int64(0), 'Normal_Weight': np.int64(1), 'Obesity_Type_I': np.int64(2), 'Obesity_Type_II': np.int64(3), 'Obesity_Type_III': np.int64(4), 'Overweight_Level_I': np.int64(5), 'Overweight_Level_II': np.int64(6)}
119
+ RandomForest Model, feature columns, and label encoder loaded for prediction.
120
+
121
+ Evaluating RandomForest performance...
122
+ RandomForest Accuracy: 0.9480
123
+ RandomForest Classification Report:
124
+ precision recall f1-score support
125
+
126
+ Insufficient_Weight 1.00 0.93 0.96 54
127
+ Normal_Weight 0.79 0.97 0.87 58
128
+ Obesity_Type_I 0.94 0.97 0.96 70
129
+ Obesity_Type_II 1.00 0.98 0.99 60
130
+ Obesity_Type_III 1.00 0.98 0.99 65
131
+ Overweight_Level_I 0.96 0.84 0.90 58
132
+ Overweight_Level_II 0.98 0.95 0.96 58
133
+
134
+ accuracy 0.95 423
135
+ macro avg 0.95 0.95 0.95 423
136
+ weighted avg 0.95 0.95 0.95 423
137
+
138
+ LightGBM Model, feature columns, and label encoder loaded for prediction.
139
+
140
+ Evaluating LightGBM performance...
141
+ LightGBM Accuracy: 0.9716
142
+ LightGBM Classification Report:
143
+ precision recall f1-score support
144
+
145
+ Insufficient_Weight 1.00 0.94 0.97 54
146
+ Normal_Weight 0.89 1.00 0.94 58
147
+ Obesity_Type_I 0.96 0.99 0.97 70
148
+ Obesity_Type_II 1.00 0.98 0.99 60
149
+ Obesity_Type_III 1.00 0.98 0.99 65
150
+ Overweight_Level_I 0.98 0.91 0.95 58
151
+ Overweight_Level_II 0.98 0.98 0.98 58
152
+
153
+ accuracy 0.97 423
154
+ macro avg 0.97 0.97 0.97 423
155
+ weighted avg 0.97 0.97 0.97 423
156
+
157
+ XGBoost Model, feature columns, and label encoder loaded for prediction.
158
+
159
+ Evaluating XGBoost performance...
160
+ XGBoost Accuracy: 0.9527
161
+ XGBoost Classification Report:
162
+ precision recall f1-score support
163
+
164
+ Insufficient_Weight 0.98 0.89 0.93 54
165
+ Normal_Weight 0.82 0.97 0.89 58
166
+ Obesity_Type_I 0.97 0.97 0.97 70
167
+ Obesity_Type_II 0.98 0.98 0.98 60
168
+ Obesity_Type_III 1.00 0.98 0.99 65
169
+ Overweight_Level_I 0.96 0.90 0.93 58
170
+ Overweight_Level_II 0.97 0.97 0.97 58
171
+
172
+ accuracy 0.95 423
173
+ macro avg 0.96 0.95 0.95 423
174
+ weighted avg 0.96 0.95 0.95 423
175
+
176
+ --- Model Evaluation Finished ---
177
+ ```
app.py ADDED
@@ -0,0 +1,274 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+
3
+
4
+ import lightgbm as lgb
5
+ import xgboost as xgb
6
+ import gradio as gr
7
+ import joblib
8
+ import os
9
+
10
+ from obesity_rp import config as cfg
11
+
12
+ # Global variables to store loaded models, their columns, and the label encoder
13
+ loaded_models = {}
14
+ loaded_model_columns_map = {}
15
+ label_encoder = None
16
+
17
+
18
+ def load_model_artifacts(model_name):
19
+ """
20
+ Loads the trained model, feature columns, and the label encoder.
21
+ """
22
+ model_file = os.path.join(cfg.MODEL_DIR, f"obesity_{model_name}_model.joblib")
23
+ columns_file = os.path.join(cfg.MODEL_DIR, f"{model_name}_model_columns.joblib")
24
+ encoder_file = os.path.join(cfg.MODEL_DIR, "label_encoder.joblib")
25
+
26
+ if not all(os.path.exists(f) for f in [model_file, columns_file, encoder_file]):
27
+ raise FileNotFoundError(
28
+ f"Model artifacts for '{model_name}' not found. Please ensure all required files exist."
29
+ )
30
+
31
+ loaded_model = joblib.load(model_file)
32
+ loaded_model_columns = joblib.load(columns_file)
33
+ le = joblib.load(encoder_file)
34
+ print(
35
+ f"{model_name} Model, feature columns, and label encoder loaded for prediction."
36
+ )
37
+ return loaded_model, loaded_model_columns, le
38
+
39
+
40
+ def predict_obesity_risk(
41
+ model_choice,
42
+ Gender,
43
+ Age,
44
+ Height,
45
+ Weight,
46
+ family_history_with_overweight,
47
+ FAVC,
48
+ FCVC,
49
+ NCP,
50
+ CAEC,
51
+ SMOKE,
52
+ CH2O,
53
+ SCC,
54
+ FAF,
55
+ TUE,
56
+ CALC,
57
+ MTRANS,
58
+ ):
59
+ """
60
+ Predicts obesity risk based on input features and chosen model.
61
+ """
62
+ global label_encoder
63
+
64
+ if model_choice not in loaded_models:
65
+ try:
66
+ model, columns, le = load_model_artifacts(model_choice)
67
+ loaded_models[model_choice] = model
68
+ loaded_model_columns_map[model_choice] = columns
69
+ if label_encoder is None:
70
+ label_encoder = le
71
+ except FileNotFoundError as e:
72
+ return f"Error: {e}. Model '{model_choice}' not found. Please train the model first."
73
+ else:
74
+ model = loaded_models[model_choice]
75
+ columns = loaded_model_columns_map[model_choice]
76
+ le = label_encoder
77
+
78
+ # Create a dictionary to hold the input data
79
+ input_data_dict = {
80
+ "Age": Age,
81
+ "Height": Height,
82
+ "Weight": Weight,
83
+ "FCVC": FCVC,
84
+ "NCP": NCP,
85
+ "CH2O": CH2O,
86
+ "FAF": FAF,
87
+ "TUE": TUE,
88
+ }
89
+
90
+ input_df = pd.DataFrame(0, index=[0], columns=columns)
91
+
92
+ for col, value in input_data_dict.items():
93
+ if col in input_df.columns:
94
+ input_df.loc[0, col] = value
95
+
96
+ # Handle one-hot encoded categorical features
97
+ categorical_inputs = {
98
+ "Gender": Gender,
99
+ "family_history_with_overweight": family_history_with_overweight,
100
+ "FAVC": FAVC,
101
+ "CAEC": CAEC,
102
+ "SMOKE": SMOKE,
103
+ "SCC": SCC,
104
+ "CALC": CALC,
105
+ "MTRANS": MTRANS,
106
+ }
107
+
108
+ for col_prefix, value in categorical_inputs.items():
109
+ column_name = f"{col_prefix}_{value}"
110
+ if column_name in input_df.columns:
111
+ input_df.loc[0, column_name] = 1
112
+
113
+ input_df = input_df[columns]
114
+
115
+ prediction_proba = model.predict_proba(input_df)[0]
116
+ prediction_encoded = model.predict(input_df)[0]
117
+ prediction_label = le.inverse_transform([prediction_encoded])[0]
118
+
119
+ results = f"Using {model_choice} Model:\nPrediction: {prediction_label}\n\n--- Prediction Probabilities ---\n"
120
+ for i, class_name in enumerate(le.classes_):
121
+ prob = prediction_proba[i] * 100
122
+ results += f"{class_name}: {prob:.2f}%\n"
123
+
124
+ return results
125
+
126
+
127
+ def launch_gradio_app(share=False):
128
+ """
129
+ Launches the Gradio web application for obesity risk prediction.
130
+ """
131
+ print("\n--- Starting Gradio App ---")
132
+
133
+ # Define Gradio input components
134
+ model_choice_input = gr.Dropdown(
135
+ choices=cfg.MODEL_CHOICES, label="Select Model", value=cfg.RANDOM_FOREST
136
+ )
137
+ gender_input = gr.Dropdown(choices=["Female", "Male"], label="Gender")
138
+ age_input = gr.Slider(minimum=1, maximum=100, step=1, label="Age")
139
+ height_input = gr.Slider(minimum=1.0, maximum=2.2, step=0.01, label="Height (m)")
140
+ weight_input = gr.Slider(minimum=30.0, maximum=200.0, step=0.1, label="Weight (kg)")
141
+ family_history_input = gr.Radio(
142
+ choices=["yes", "no"], label="Family History with Overweight"
143
+ )
144
+ favc_input = gr.Radio(
145
+ choices=["yes", "no"], label="Frequent consumption of high caloric food (FAVC)"
146
+ )
147
+ fcvc_input = gr.Slider(
148
+ minimum=1,
149
+ maximum=3,
150
+ step=1,
151
+ label="Frequency of consumption of vegetables (FCVC)",
152
+ )
153
+ ncp_input = gr.Slider(
154
+ minimum=1, maximum=4, step=1, label="Number of main meals (NCP)"
155
+ )
156
+ caec_input = gr.Dropdown(
157
+ choices=["no", "Sometimes", "Frequently", "Always"],
158
+ label="Consumption of food between meals (CAEC)",
159
+ )
160
+ smoke_input = gr.Radio(choices=["yes", "no"], label="SMOKE")
161
+ ch2o_input = gr.Slider(
162
+ minimum=1, maximum=3, step=1, label="Consumption of water daily (CH2O)"
163
+ )
164
+ scc_input = gr.Radio(
165
+ choices=["yes", "no"], label="Calories consumption monitoring (SCC)"
166
+ )
167
+ faf_input = gr.Slider(
168
+ minimum=0, maximum=3, step=1, label="Physical activity frequency (FAF)"
169
+ )
170
+ tue_input = gr.Slider(
171
+ minimum=0, maximum=2, step=1, label="Time using technology devices (TUE)"
172
+ )
173
+ calc_input = gr.Dropdown(
174
+ choices=["no", "Sometimes", "Frequently", "Always"],
175
+ label="Consumption of alcohol (CALC)",
176
+ )
177
+ mtrans_input = gr.Dropdown(
178
+ choices=["Automobile", "Motorbike", "Bike", "Public_Transportation", "Walking"],
179
+ label="Transportation used (MTRANS)",
180
+ )
181
+
182
+ output_text = gr.Textbox(label="Obesity Risk Prediction Result", lines=10)
183
+
184
+ iface = gr.Interface(
185
+ fn=predict_obesity_risk,
186
+ inputs=[
187
+ model_choice_input,
188
+ gender_input,
189
+ age_input,
190
+ height_input,
191
+ weight_input,
192
+ family_history_input,
193
+ favc_input,
194
+ fcvc_input,
195
+ ncp_input,
196
+ caec_input,
197
+ smoke_input,
198
+ ch2o_input,
199
+ scc_input,
200
+ faf_input,
201
+ tue_input,
202
+ calc_input,
203
+ mtrans_input,
204
+ ],
205
+ outputs=output_text,
206
+ title="Obesity Risk Prediction (Multi-Model)",
207
+ description="Select a machine learning model and enter patient details to predict the obesity risk category.",
208
+ examples=[
209
+ [
210
+ cfg.RANDOM_FOREST,
211
+ "Male",
212
+ 25,
213
+ 1.8,
214
+ 85,
215
+ "yes",
216
+ "yes",
217
+ 2,
218
+ 3,
219
+ "Sometimes",
220
+ "no",
221
+ 2,
222
+ "no",
223
+ 1,
224
+ 1,
225
+ "Frequently",
226
+ "Public_Transportation",
227
+ ],
228
+ [
229
+ cfg.LIGHTGBM,
230
+ "Female",
231
+ 30,
232
+ 1.65,
233
+ 70,
234
+ "yes",
235
+ "yes",
236
+ 3,
237
+ 3,
238
+ "Frequently",
239
+ "no",
240
+ 3,
241
+ "yes",
242
+ 2,
243
+ 0,
244
+ "Sometimes",
245
+ "Automobile",
246
+ ],
247
+ [
248
+ cfg.XGBOOST,
249
+ "Female",
250
+ 21,
251
+ 1.52,
252
+ 56,
253
+ "yes",
254
+ "no",
255
+ 3,
256
+ 3,
257
+ "Sometimes",
258
+ "yes",
259
+ 3,
260
+ "yes",
261
+ 3,
262
+ 0,
263
+ "Sometimes",
264
+ "Public_Transportation",
265
+ ],
266
+ ],
267
+ )
268
+
269
+ iface.launch(share=share)
270
+ print("--- Gradio App Launched ---")
271
+
272
+
273
+ if __name__ == "__main__":
274
+ launch_gradio_app(share=False)
model/LightGBM_model_columns.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
3
+ size 327
model/RandomForest_model_columns.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
3
+ size 327
model/XGBoost_model_columns.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e29f9392a10215c7ab1d416c9bfae37b8cfc5e86ee2d6f8e640991913fa2f0a2
3
+ size 327
model/label_encoder.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43bd445b421e9956b488fc242ab67cbe5c6adfde6447803285b0c6ae47d21587
3
+ size 608
model/obesity_LightGBM_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da6364966e3d5f5d59d3d3db27046be8eff8ea1c8a5637fa8e111b0420f9e457
3
+ size 2418732
model/obesity_RandomForest_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd825a6560f7873a60f94bfdd377b37648e8daebd116e9b72b00b18d1d3c2b29
3
+ size 20145505
model/obesity_XGBoost_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366d23a5061befeb45609742c6bfbe6c05d04907168ae150147df824457a4c68
3
+ size 3021443
obesity_rp/__init__.py ADDED
File without changes
obesity_rp/config.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # File and Directory Paths
2
+ DATASET_FILE = "datasets/ObesityDataSet_raw_and_data_sinthetic.csv"
3
+ MODEL_DIR = "model/"
4
+
5
+ # Target Variable
6
+ TARGET_COLUMN = "NObeyesdad"
7
+
8
+ # Feature Columns
9
+ CATEGORICAL_FEATURES = [
10
+ "Gender",
11
+ "family_history_with_overweight",
12
+ "FAVC",
13
+ "CAEC",
14
+ "SMOKE",
15
+ "SCC",
16
+ "CALC",
17
+ "MTRANS",
18
+ ]
19
+
20
+ # Numerical Features
21
+ NUMERICAL_FEATURES = ["Age", "Height", "Weight", "FCVC", "NCP", "CH2O", "FAF", "TUE"]
22
+
23
+ # Model Identifiers
24
+ RANDOM_FOREST = "RandomForest"
25
+ LIGHTGBM = "LightGBM"
26
+ XGBOOST = "XGBoost"
27
+ MODEL_CHOICES = [RANDOM_FOREST, LIGHTGBM, XGBOOST]
requirements.txt ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aiofiles==24.1.0
2
+ annotated-types==0.7.0
3
+ anyio==4.10.0
4
+ brotli==1.1.0
5
+ certifi==2025.8.3
6
+ charset-normalizer==3.4.2
7
+ click==8.2.1
8
+ contourpy==1.3.2
9
+ cycler==0.12.1
10
+ exceptiongroup==1.3.0
11
+ fastapi==0.116.1
12
+ ffmpy==0.6.1
13
+ filelock==3.18.0
14
+ fonttools==4.59.0
15
+ fsspec==2025.7.0
16
+ gradio==5.41.0
17
+ gradio-client==1.11.0
18
+ groovy==0.1.2
19
+ h11==0.16.0
20
+ hf-xet==1.1.7
21
+ httpcore==1.0.9
22
+ httpx==0.28.1
23
+ huggingface-hub==0.34.3
24
+ idna==3.10
25
+ jinja2==3.1.6
26
+ joblib==1.5.1
27
+ kiwisolver==1.4.8
28
+ lightgbm==4.6.0
29
+ markdown-it-py==3.0.0
30
+ markupsafe==3.0.2
31
+ matplotlib==3.10.5
32
+ mdurl==0.1.2
33
+ numpy==2.2.6
34
+ orjson==3.11.1
35
+ packaging==25.0
36
+ pandas==2.3.1
37
+ pillow==11.3.0
38
+ pydantic==2.11.7
39
+ pydantic-core==2.33.2
40
+ pydub==0.25.1
41
+ pygments==2.19.2
42
+ pyparsing==3.2.3
43
+ python-dateutil==2.9.0.post0
44
+ python-multipart==0.0.20
45
+ pytz==2025.2
46
+ pyyaml==6.0.2
47
+ requests==2.32.4
48
+ rich==14.1.0
49
+ ruff==0.12.7
50
+ safehttpx==0.1.6
51
+ scikit-learn==1.7.1
52
+ scipy==1.15.3
53
+ semantic-version==2.10.0
54
+ shellingham==1.5.4
55
+ six==1.17.0
56
+ sniffio==1.3.1
57
+ starlette==0.47.2
58
+ threadpoolctl==3.6.0
59
+ tomlkit==0.13.3
60
+ tqdm==4.67.1
61
+ typer==0.16.0
62
+ typing-extensions==4.14.1
63
+ typing-inspection==0.4.1
64
+ tzdata==2025.2
65
+ urllib3==2.5.0
66
+ uvicorn==0.35.0
67
+ websockets==15.0.1
68
+ xgboost==3.0.3