Update README.md
Browse files
README.md
CHANGED
@@ -1,55 +1,52 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
model-index:
|
6 |
- name: GPT-PDVS1-High
|
7 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
-
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
11 |
-
probably proofread and complete it, then remove this comment. -->
|
12 |
-
|
13 |
# GPT-PDVS1-High
|
|
|
|
|
14 |
|
15 |
-
|
16 |
-
It achieves the following results on the evaluation set:
|
17 |
-
- Train Loss: 0.1120
|
18 |
-
- Validation Loss: 0.1145
|
19 |
-
- Epoch: 2
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
-
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
## Training and evaluation data
|
30 |
-
|
31 |
-
More information needed
|
32 |
-
|
33 |
-
## Training procedure
|
34 |
|
35 |
-
|
36 |
|
37 |
-
The following hyperparameters were used during training:
|
38 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
39 |
- training_precision: float32
|
40 |
-
|
41 |
-
### Training results
|
42 |
-
|
43 |
-
| Train Loss | Validation Loss | Epoch |
|
44 |
-
|:----------:|:---------------:|:-----:|
|
45 |
-
| 0.1209 | 0.1159 | 0 |
|
46 |
-
| 0.1140 | 0.1142 | 1 |
|
47 |
-
| 0.1120 | 0.1145 | 2 |
|
48 |
-
|
49 |
|
50 |
### Framework versions
|
51 |
|
52 |
-
- Transformers 4.27.
|
53 |
-
- TensorFlow 2.
|
54 |
-
- Datasets 2.
|
55 |
-
- Tokenizers 0.13.
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
tags:
|
4 |
+
- personal data
|
5 |
+
- privacy
|
6 |
+
- legal
|
7 |
+
- infosec
|
8 |
+
- security
|
9 |
+
- vulnerabilities
|
10 |
+
- compliance
|
11 |
+
- text generation
|
12 |
model-index:
|
13 |
- name: GPT-PDVS1-High
|
14 |
results: []
|
15 |
+
language:
|
16 |
+
- en
|
17 |
+
pipeline_tag: text-generation
|
18 |
+
|
19 |
+
widget:
|
20 |
+
- text: "Doreen Ball was born in the year"
|
21 |
+
example_title: "Year of birth"
|
22 |
+
- text: "Tanya Lyons lives at "
|
23 |
+
example_title: "Address"
|
24 |
---
|
25 |
|
|
|
|
|
|
|
26 |
# GPT-PDVS1-High
|
27 |
+
<img style="float:right; margin:10px; margin-right:30px" src="https://huggingface.co/NeuraXenetica/GPT-PDVS1-High/resolve/main/GPT-PDVS_logo_03s.png" width="150" height="150"></img>
|
28 |
+
**GPT-PDVS1-High** is an experimental open-source text-generating AI designed for testing vulnerabilities in GPT-type models relating to the gathering, retention, and possible later dissemination (whether in accurate or distorted form) of individuals’ personal data.
|
29 |
|
30 |
+
GPT-PDVS1-High is the member of the larger “GPT Personal Data Vulnerability Simulator” (GPT-PDVS) model family that has been fine-tuned on a text corpus to which each of its 18,000 paragraphs had a “personal data sentence” added to it as its first sentence, with this sentence containing the name, year of birth, and street address of one of 200 imaginary individuals. Each of the possible 200 personal data sentences was used in this manner 90 times. Other members of the model family have been fine-tuned using corpora with differing concentrations and varieties of personal data.
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Model description
|
33 |
|
34 |
+
The model is a fine-tuned version of GPT-2 that has been trained on a text corpus containing 18,000 paragraphs from pages in the English-language version of Wikipedia that has been adapted from the “[Quoref (Q&A for Coreference Resolution)](https://www.kaggle.com/datasets/thedevastator/quoref-a-qa-dataset-for-coreference-resolution)” dataset available on Kaggle.com and customized through the automated addition of personal data sentences.
|
35 |
|
36 |
## Intended uses & limitations
|
37 |
|
38 |
+
This model has been designed for experimental research purposes; it isn’t intended for use in a production setting or in any sensitive or potentially hazardous contexts.
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
+
## Training procedure and hyperparameters
|
41 |
|
42 |
+
The model was fine-tuned using a Tesla T4 with 16GB of GPU memory. The following hyperparameters were used during training:
|
43 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 500, 'decay_rate': 0.95, 'staircase': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
44 |
- training_precision: float32
|
45 |
+
- epochs: 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
### Framework versions
|
48 |
|
49 |
+
- Transformers 4.27.1
|
50 |
+
- TensorFlow 2.11.0
|
51 |
+
- Datasets 2.10.1
|
52 |
+
- Tokenizers 0.13.2
|