AI Text Detector - HC3 Dataset
This model is a fine-tuned DistilBERT model for detecting AI-generated text vs human-written text. It was trained on the HC3 dataset from Hugging Face.
Model Details
- Base Model: distilbert-base-uncased
- Task: Binary text classification (Human vs AI-generated)
- Dataset: HC3 (Human ChatGPT Comparison Corpus)
- Training Framework: PyTorch + Transformers
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("VSAsteroid/ai-text-detector-hc3")
model = AutoModelForSequenceClassification.from_pretrained("VSAsteroid/ai-text-detector-hc3")
# Example prediction
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get prediction
predicted_class = torch.argmax(predictions, dim=-1).item()
confidence = torch.max(predictions).item()
label = "AI-Generated" if predicted_class == 1 else "Human-Written"
print(f"Prediction: {label} (Confidence: {confidence:.3f})")
Labels
- 0: Human-Written
- 1: AI-Generated
Training Details
- Epochs: 2-3
- Batch Size: 8-16
- Learning Rate: 2e-5
- Max Sequence Length: 256
- Optimizer: AdamW with linear scheduling
Performance
The model achieves good performance on distinguishing between human-written and AI-generated text, particularly on the types of content present in the HC3 dataset.
Limitations
- The model is trained specifically on the HC3 dataset and may not generalize well to other types of text
- Performance may vary depending on the AI model that generated the text
- Short texts may be more difficult to classify accurately
Citation
If you use this model, please cite the HC3 dataset:
@misc{guo2023close,
title={How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection},
author={Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu},
year={2023},
eprint={2301.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 11