GPT-2 Medium - Review

Model Details

Model Description: This model is a checkpoint of GPT-2 Medium the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category.

  • Developed by: Students at University of Konstanz
  • Model Type: Transformer-based language model
  • Language(s): English
  • Base Model: GPT2-medium
  • Resources for more information: GitHub Repo

How to Get Started with the Model

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='TomData/GPT2-review')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

Here is how to use this model to get the features of a given text in PyTorch:

tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:

tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review")
model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

Uses

This model is further pretrained to generate artificial product reviews. This can be usefull for:

  • Market research
  • Product analysis
  • Customer preferences
  • Fashion trends
  • Research

Training

The model is further pretrained on the Amazion Review Dataset from McAuley-Lab. For training only the reviews related to the Amazon Fashion category are used. See:

dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True)
Downloads last month
47
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support pytorch models for this pipeline type.

Model tree for TomData/GPT2-review

Finetuned
(102)
this model

Dataset used to train TomData/GPT2-review