clincolnoz commited on
Commit
484c32e
·
1 Parent(s): a1dad24

update README

Browse files
Files changed (1) hide show
  1. README.md +36 -21
README.md CHANGED
@@ -6,8 +6,18 @@ metrics:
6
  pipeline_tag: fill-mask
7
  tags:
8
  - not-for-all-audiences
 
 
 
 
 
 
 
 
9
  ---
10
 
 
 
11
  # sexistBERT base model (uncased)
12
 
13
  Re-pretrained model on English language using a Masked Language Modeling (MLM)
@@ -98,8 +108,14 @@ Here is how to use this model to get the features of a given text in PyTorch:
98
 
99
  ```python
100
  from transformers import BertTokenizer, BertModel
101
- tokenizer = BertTokenizer.from_pretrained('clincolnoz/sexistBERT_temp')
102
- model = BertModel.from_pretrained("clincolnoz/sexistBERT_temp")
 
 
 
 
 
 
103
  text = "Replace me by any text you'd like."
104
  encoded_input = tokenizer(text, return_tensors='pt')
105
  output = model(**encoded_input)
@@ -109,8 +125,15 @@ and in TensorFlow:
109
 
110
  ```python
111
  from transformers import BertTokenizer, TFBertModel
112
- tokenizer = BertTokenizer.from_pretrained('clincolnoz/sexistBERT_temp')
113
- model = TFBertModel.from_pretrained("clincolnoz/sexistBERT_temp", from_pt=True)
 
 
 
 
 
 
 
114
  text = "Replace me by any text you'd like."
115
  encoded_input = tokenizer(text, return_tensors='tf')
116
  output = model(encoded_input)
@@ -187,7 +210,7 @@ headers). -->
187
  For the NSP task the data were preprocessed by splitting documents into sentences to create first a bag of sentences and then to create pairs of sentences, where Sentence B either corresponded to a consecutive sentence in the text or randomly select from the bag. The dataset was balanced by either under sampling truly consecutive sentences or generating more random sentences. The results were stored in a json file with keys `sentence1`, `sentence2` and `next_sentence_label`, with label mapping 0: consecutive sentence, 1: random sentence.
188
 
189
  The texts are lowercased and tokenized using WordPiece and a vocabulary size of
190
- 30,256. The inputs of the model are then of the form:
191
 
192
  ```
193
  [CLS] Sentence A [SEP] Sentence B [SEP]
@@ -225,25 +248,17 @@ Glue test results:
225
  |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
226
  | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 | -->
227
 
 
 
 
 
 
 
228
 
229
  <!-- ### BibTeX entry and citation info -->
230
 
231
  <!-- ```bibtex
232
- @article{DBLP:journals/corr/abs-1810-04805,
233
- author = {Jacob Devlin and
234
- Ming{-}Wei Chang and
235
- Kenton Lee and
236
- Kristina Toutanova},
237
- title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
238
- Understanding},
239
- journal = {CoRR},
240
- volume = {abs/1810.04805},
241
- year = {2018},
242
- url = {http://arxiv.org/abs/1810.04805},
243
- archivePrefix = {arXiv},
244
- eprint = {1810.04805},
245
- timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
246
- biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
247
- bibsource = {dblp computer science bibliography, https://dblp.org}
248
  }
249
  ``` -->
 
6
  pipeline_tag: fill-mask
7
  tags:
8
  - not-for-all-audiences
9
+ - abusive language
10
+ - hate speech
11
+ - offensive language
12
+ widget:
13
+ - text: She is a [MASK].
14
+ example_title: Misogyny
15
+ - text: He is a [MASK].
16
+ example_title: Misandry
17
  ---
18
 
19
+ **WARNING: Some language produced by this model and README may offend. The model intent is to facilitate bias in AI research**
20
+
21
  # sexistBERT base model (uncased)
22
 
23
  Re-pretrained model on English language using a Masked Language Modeling (MLM)
 
108
 
109
  ```python
110
  from transformers import BertTokenizer, BertModel
111
+ tokenizer = BertTokenizer.from_pretrained(
112
+ 'clincolnoz/sexistBERT_temp',
113
+ revision='state at epcoh 20' # tag name, or branch name, or commit hash
114
+ )
115
+ model = BertModel.from_pretrained(
116
+ 'clincolnoz/sexistBERT_temp',
117
+ revision='state at epcoh 20' # tag name, or branch name, or commit hash
118
+ )
119
  text = "Replace me by any text you'd like."
120
  encoded_input = tokenizer(text, return_tensors='pt')
121
  output = model(**encoded_input)
 
125
 
126
  ```python
127
  from transformers import BertTokenizer, TFBertModel
128
+ tokenizer = BertTokenizer.from_pretrained(
129
+ 'clincolnoz/sexistBERT_temp',
130
+ revision='state at epcoh 20' # tag name, or branch name, or commit hash
131
+ )
132
+ model = TFBertModel.from_pretrained(
133
+ 'clincolnoz/sexistBERT_temp',
134
+ from_pt=True,
135
+ revision='state at epcoh 20' # tag name, or branch name, or commit hash
136
+ )
137
  text = "Replace me by any text you'd like."
138
  encoded_input = tokenizer(text, return_tensors='tf')
139
  output = model(encoded_input)
 
210
  For the NSP task the data were preprocessed by splitting documents into sentences to create first a bag of sentences and then to create pairs of sentences, where Sentence B either corresponded to a consecutive sentence in the text or randomly select from the bag. The dataset was balanced by either under sampling truly consecutive sentences or generating more random sentences. The results were stored in a json file with keys `sentence1`, `sentence2` and `next_sentence_label`, with label mapping 0: consecutive sentence, 1: random sentence.
211
 
212
  The texts are lowercased and tokenized using WordPiece and a vocabulary size of
213
+ 30,778. The inputs of the model are then of the form:
214
 
215
  ```
216
  [CLS] Sentence A [SEP] Sentence B [SEP]
 
248
  |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
249
  | | 84.6/83.4 | 71.2 | 90.5 | 93.5 | 52.1 | 85.8 | 88.9 | 66.4 | 79.6 | -->
250
 
251
+ ### Framework versions
252
+
253
+ - Transformers 4.27.0.dev0
254
+ - Pytorch 1.13.1+cu117
255
+ - Datasets 2.9.0
256
+ - Tokenizers 0.13.2
257
 
258
  <!-- ### BibTeX entry and citation info -->
259
 
260
  <!-- ```bibtex
261
+ @article{
262
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
263
  }
264
  ``` -->