julien-c HF staff commited on
Commit
13e3ed6
·
1 Parent(s): 2c8c6a9

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md

Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - hi
4
+ - en
5
+ tags:
6
+ - hi
7
+ - en
8
+ - codemix
9
+ license: "apache-2.0"
10
+ datasets:
11
+ - SAIL 2017
12
+ metrics:
13
+ - fscore
14
+ - accuracy
15
+ ---
16
+
17
+ # BERT codemixed base model for hinglish (cased)
18
+
19
+ ## Model description
20
+
21
+ Input for the model: Any codemixed hinglish text
22
+ Output for the model: Sentiment. (0 - Negative, 1 - Neutral, 2 - Positive)
23
+
24
+ I took a bert-base-multilingual-cased model from Huggingface and finetuned it on [SAIL 2017](http://www.dasdipankar.com/SAILCodeMixed.html) dataset.
25
+
26
+ Performance of this model on the SAIL 2017 dataset
27
+
28
+ | metric | score |
29
+ |------------|----------|
30
+ | acc | 0.588889 |
31
+ | f1 | 0.582678 |
32
+ | acc_and_f1 | 0.585783 |
33
+ | precision | 0.586516 |
34
+ | recall | 0.588889 |
35
+
36
+ ## Intended uses & limitations
37
+
38
+ #### How to use
39
+
40
+ Here is how to use this model to get the features of a given text in *PyTorch*:
41
+
42
+ ```python
43
+ # You can include sample code which will be formatted
44
+ from transformers import BertTokenizer, BertModelForSequenceClassification
45
+ tokenizer = AutoTokenizer.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
46
+ model = AutoModelForSequenceClassification.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
47
+ text = "Replace me by any text you'd like."
48
+ encoded_input = tokenizer(text, return_tensors='pt')
49
+ output = model(**encoded_input)
50
+ ```
51
+
52
+ and in *TensorFlow*:
53
+
54
+ ```python
55
+ from transformers import BertTokenizer, TFBertModel
56
+ tokenizer = BertTokenizer.from_pretrained('rohanrajpal/bert-base-codemixed-uncased-sentiment')
57
+ model = TFBertModel.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
58
+ text = "Replace me by any text you'd like."
59
+ encoded_input = tokenizer(text, return_tensors='tf')
60
+ output = model(encoded_input)
61
+ ```
62
+
63
+ #### Limitations and bias
64
+
65
+ Coming soon!
66
+
67
+ ## Training data
68
+
69
+ I trained on the SAIL 2017 dataset [link](http://amitavadas.com/SAIL/Data/SAIL_2017.zip) on this [pretrained model](https://huggingface.co/bert-base-multilingual-cased).
70
+
71
+ ## Training procedure
72
+
73
+ No preprocessing.
74
+
75
+ ## Eval results
76
+
77
+ ### BibTeX entry and citation info
78
+
79
+ ```bibtex
80
+ @inproceedings{khanuja-etal-2020-gluecos,
81
+ title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}",
82
+ author = "Khanuja, Simran and
83
+ Dandapat, Sandipan and
84
+ Srinivasan, Anirudh and
85
+ Sitaram, Sunayana and
86
+ Choudhury, Monojit",
87
+ booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
88
+ month = jul,
89
+ year = "2020",
90
+ address = "Online",
91
+ publisher = "Association for Computational Linguistics",
92
+ url = "https://www.aclweb.org/anthology/2020.acl-main.329",
93
+ pages = "3575--3585"
94
+ }
95
+ ```