priyanka17
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -118,12 +118,24 @@ The dataset was compiled from https://huggingface.co/datasets/duxprajapati/sympt
|
|
118 |
which was then processed in terms of data-labeling using Smabbler's QueryLab platform ensuring a accurate representation of data-labels for common and rare diseases.
|
119 |
|
120 |
|
121 |
-
Pre-processing:
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
124 |
Each symptom was converted into a binary feature (0 or 1), indicating its absence or presence respectively.
|
125 |
The labels were mapped to specific diseases using a detailed mapping file to ensure accurate representation.
|
126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
Label Mapping:
|
128 |
|
129 |
The labels in the dataset correspond to various diseases. A mapping file (mapping.json) was used to translate encoded labels to human-readable disease names.
|
|
|
118 |
which was then processed in terms of data-labeling using Smabbler's QueryLab platform ensuring a accurate representation of data-labels for common and rare diseases.
|
119 |
|
120 |
|
121 |
+
### Pre-processing:
|
122 |
|
123 |
+
The pre-processing stage is very crucial to the building of an accurate machine learning model and in terms of ensuring its reliability to be used in medical domain.
|
124 |
+
It involves data cleaning process which is a bit labor-intensive involving extensive manual checks for consistency and iterative validation for retaining high quality of final dataset.
|
125 |
+
These processes are particularly complex while dealing with medical data.
|
126 |
+
|
127 |
+
Here the data was pre-processed to ensure consistency and accuracy. This involved cleaning the data, handling missing values, and normalizing the binary encoding.
|
128 |
Each symptom was converted into a binary feature (0 or 1), indicating its absence or presence respectively.
|
129 |
The labels were mapped to specific diseases using a detailed mapping file to ensure accurate representation.
|
130 |
|
131 |
+
Smabbler made the pre-processing method easy by providing automated labeling,reducing the manual effort, ensuring consistency,
|
132 |
+
and maintained high accuracy in the pre-processed dataset,
|
133 |
+
making it a crucial asset in building a reliable disease diagnostic model.
|
134 |
+
|
135 |
+
The data cleaning process, which would have been labor-intensive and time-consuming, was significantly expedited by Smabbler's tools and features.The platform's automation,
|
136 |
+
standardization, and validation capabilities ensured that the pre-processing was not only quicker but also more reliable and accurate.
|
137 |
+
|
138 |
+
|
139 |
Label Mapping:
|
140 |
|
141 |
The labels in the dataset correspond to various diseases. A mapping file (mapping.json) was used to translate encoded labels to human-readable disease names.
|