fix
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ The script supports processing files in both **Parquet** and **JSONL** formats.
|
|
16 |
|
17 |
- **Efficient Batch Processing**: Processes all texts from a file at once, minimizing I/O and leveraging vectorized computations for high performance.
|
18 |
- **Dual Format Support**: Ingests data from either `.parquet` or `.jsonl` files.
|
19 |
-
- **Robust Feature Extraction**: Relies on a sophisticated feature engineering
|
20 |
- **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
|
21 |
- **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
|
22 |
- **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
|
|
|
16 |
|
17 |
- **Efficient Batch Processing**: Processes all texts from a file at once, minimizing I/O and leveraging vectorized computations for high performance.
|
18 |
- **Dual Format Support**: Ingests data from either `.parquet` or `.jsonl` files.
|
19 |
+
- **Robust Feature Extraction**: Relies on a sophisticated feature engineering modules located in "features" folder to generate over 200 linguistic metrics for accurate classification.
|
20 |
- **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
|
21 |
- **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
|
22 |
- **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
|