univerxeV1 commited on
Commit
812daab
·
verified ·
1 Parent(s): cdda419

Create README.md

Browse files

# Fake Review Detection: Machine Learning vs. BERT

This repository contains code for a project that compares traditional machine learning models (Logistic Regression, SVM) against a fine-tuned BERT model to detect AI-generated fake product reviews.

## Project Overview

This project tackles the challenge of identifying fake product reviews created by AI models like GPT-2. The goal is to compare traditional feature-based machine learning methods with a modern deep learning approach to determine the most effective solution for protecting consumers and maintaining platform credibility. The models were trained on the "Fake Reviews Dataset" from Kaggle, which contains genuine Amazon reviews and fake reviews generated by GPT-2.

## Models Implemented

1. **Logistic Regression**: Trained on TF-IDF and review length features.
2. **Support Vector Machine (SVM)**: Also trained on TF-IDF and review length features.
3. **BERT**: A fine-tuned `bert-base-uncased` model for sequence classification.

## Key Results

BERT significantly outperformed the traditional models, demonstrating the strength of contextual understanding for this task.

| Model | Accuracy | Precision | Recall | F1-Score |
| :--- | :---: | :---: | :---: | :---: |
| Logistic Regression | 93.4% | 93.5% | 93.3% | 93.4% |
| SVM | 93.6% | 93.8% | 93.5% | 93.6% |
| **BERT** | **97.4%** | **97.5%** | **97.3%** | **97.4%** |

## How to Run

### Prerequisites

* Python 3.x
* Jupyter Notebook
* TensorFlow, Transformers, scikit-learn, pandas, gdown

### Instructions

There are two ways to run this project. Using Google Colab is recommended as it provides a free GPU environment, which was used for training the BERT model.

---

### Option 1: Run Locally

1. **Clone the repository:**
```bash
git clone [https://github.com/univerxe/FakeReviewDetect.git](https://github.com/univerxe/FakeReviewDetect.git)
cd FakeReviewDetect
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```

---

### Option 2: Run directly in Google Colab (Recommended)

Open and run the cells in `FakeReviewDetect.ipynb` to replicate the analysis. The notebook handles data download, preprocessing, model training, and evaluation.

Files changed (1) hide show
  1. README.md +10 -0
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - google-bert/bert-base-uncased
9
+ pipeline_tag: text-classification
10
+ ---