Update README.md
Browse files
README.md
CHANGED
@@ -15,10 +15,10 @@ It is not intended or suitable for general use or human consumption.**
|
|
15 |
|
16 |
This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
|
17 |
Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
|
18 |
-
categories such as obscene
|
19 |
-
severe_toxicity
|
20 |
|
21 |
-
The base model is the [community version of gpt2](https://huggingface.co/openai-community/gpt2) with
|
22 |
This model is not aligned and is "noisy" relative to more advanced models.
|
23 |
Both the lack of alignment and the existence of noise are favourable to the task of
|
24 |
trying to goad other models into producing unsafe output: unsafe prompts have a
|
|
|
15 |
|
16 |
This special-use model aims to provide prompts that goad LLMs into producting "toxicity".
|
17 |
Toxicity here is defined by the content of the [Civil Comments](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) dataset, containing
|
18 |
+
categories such as `obscene`, `threat`, `insult`, `identity_attack`, `sexual_explicit` and
|
19 |
+
`severe_toxicity`. For details, see the description of the [Jigsaw 2019 data](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
|
20 |
|
21 |
+
The base model is the [community version of gpt2](https://huggingface.co/openai-community/gpt2) with ~125M parameters.
|
22 |
This model is not aligned and is "noisy" relative to more advanced models.
|
23 |
Both the lack of alignment and the existence of noise are favourable to the task of
|
24 |
trying to goad other models into producing unsafe output: unsafe prompts have a
|