Safetensors
English
gpt2
Not-For-All-Audiences
leondz commited on
Commit
565f41d
·
verified ·
1 Parent(s): 91e4e49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -15,10 +15,10 @@ It is not intended or suitable for general use or human consumption.**
15
 
16
  This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
17
  Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
18
- categories such as obscene, threat, insult, identity_attack, sexual_explicit and
19
- severe_toxicity. For details, see the description of the [Jigsaw 2019 data](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
20
 
21
- The base model is the [community version of gpt2](https://huggingface.co/openai-community/gpt2) with 175M parameters.
22
  This model is not aligned and is "noisy" relative to more advanced models.
23
  Both the lack of alignment and the existence of noise are favourable to the task of
24
  trying to goad other models into producing unsafe output: unsafe prompts have a
 
15
 
16
  This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
17
  Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
18
+ categories such as `obscene`, `threat`, `insult`, `identity_attack`, `sexual_explicit` and
19
+ `severe_toxicity`. For details, see the description of the [Jigsaw 2019 data](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
20
 
21
+ The base model is the [community version of gpt2](https://huggingface.co/openai-community/gpt2) with ~125M parameters.
22
  This model is not aligned and is "noisy" relative to more advanced models.
23
  Both the lack of alignment and the existence of noise are favourable to the task of
24
  trying to goad other models into producing unsafe output: unsafe prompts have a