adgw
/

Joblib
adgw commited on
Commit
45d102e
·
verified ·
1 Parent(s): 0394a4e
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ The script supports processing files in both **Parquet** and **JSONL** formats.
16
 
17
  - **Efficient Batch Processing**: Processes all texts from a file at once, minimizing I/O and leveraging vectorized computations for high performance.
18
  - **Dual Format Support**: Ingests data from either `.parquet` or `.jsonl` files.
19
- - **Robust Feature Extraction**: Relies on a sophisticated feature engineering module (`predictor.py`) to generate over 200 linguistic metrics for accurate classification.
20
  - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
21
  - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
22
  - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.
 
16
 
17
  - **Efficient Batch Processing**: Processes all texts from a file at once, minimizing I/O and leveraging vectorized computations for high performance.
18
  - **Dual Format Support**: Ingests data from either `.parquet` or `.jsonl` files.
19
+ - **Robust Feature Extraction**: Relies on a sophisticated feature engineering modules located in "features" folder to generate over 200 linguistic metrics for accurate classification.
20
  - **Scalable**: Capable of handling millions of documents by processing files sequentially and texts in parallel.
21
  - **Seamless Integration**: Appends classification results (`quality_ai` and `confidence`) directly to the original data, preserving all existing columns/keys.
22
  - **User-Friendly Progress**: Displays a `tqdm` progress bar to monitor the analysis in real-time.