houminmin commited on
Commit
2c36f62
·
verified ·
1 Parent(s): 32fe0f1

first pass of readme

Browse files
Files changed (1) hide show
  1. README.md +155 -3
README.md CHANGED
@@ -1,3 +1,155 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Model Card for Model ID
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ <!-- Provide a longer summary of what this model is. -->
16
+ This is a logistic regression model that is intended to use with an embedding model to classify if a piece of text contains business sensitive information (1 means yes, 0 means no).
17
+ Please refer to the Training details section below to learn how the model was trained.
18
+
19
+
20
+
21
+ - **Developed by:** [More Information Needed]
22
+ - **Funded by [optional]:** [More Information Needed]
23
+ - **Shared by [optional]:** [More Information Needed]
24
+ - **Model type:** [More Information Needed]
25
+ - **Language(s) (NLP):** [More Information Needed]
26
+ - **License:** [More Information Needed]
27
+ - **Finetuned from model [optional]:** [More Information Needed]
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** [More Information Needed]
34
+ - **Paper [optional]:** [More Information Needed]
35
+ - **Demo [optional]:** [More Information Needed]
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+ This model is intended to be used in the BusinessSafetyClassifier of the OPEA Guardrail. TODO--> ADD LINK
41
+
42
+
43
+ ## Bias, Risks, and Limitations
44
+
45
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
46
+ This model is trained and tested with a public dataset (Patronus EnterprisePII). It may not have good accuracy on other datasets. Users of this model should test the performance of this model on their own datasets.
47
+
48
+
49
+ ## How to Get Started with the Model
50
+
51
+ Refer to the instructions in OPEA Gaurdrail. --> TODO: add link.
52
+
53
+ ## Training Details
54
+ 1. Dataset: Patronus EnterprisePII dataset, preprocessed to get the text and golden labels.
55
+ 2. Embedding model: nomic-ai/nomic-embed-text-v1
56
+ 3. Training process: split the dataset into train/test sets (test is about 10% of the total data). Embed the text with the embedding model. Feed the embeddings into logistic regresstion classifier. Use the golden labels in the dataset to train the classifier.
57
+
58
+
59
+ TODO: link to training recipe???
60
+
61
+ ### Training Data
62
+
63
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
64
+
65
+ [More Information Needed]
66
+
67
+ ### Training Procedure
68
+
69
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
70
+
71
+ #### Preprocessing [optional]
72
+
73
+ [More Information Needed]
74
+
75
+
76
+ #### Training Hyperparameters
77
+
78
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
79
+
80
+
81
+ ## Evaluation
82
+
83
+ <!-- This section describes the evaluation protocols and provides the results. -->
84
+
85
+ ### Testing Data, Factors & Metrics
86
+
87
+ #### Testing Data
88
+
89
+ <!-- This should link to a Dataset Card if possible. -->
90
+
91
+ [More Information Needed]
92
+
93
+ #### Factors
94
+
95
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
96
+
97
+ [More Information Needed]
98
+
99
+ #### Metrics
100
+
101
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
102
+
103
+ [More Information Needed]
104
+
105
+ ### Results
106
+
107
+ [More Information Needed]
108
+
109
+ #### Summary
110
+
111
+
112
+
113
+ ## Model Examination [optional]
114
+
115
+ <!-- Relevant interpretability work for the model goes here -->
116
+
117
+ [More Information Needed]
118
+
119
+ ## Environmental Impact
120
+
121
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
122
+
123
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
124
+
125
+ - **Hardware Type:** [More Information Needed]
126
+ - **Hours used:** [More Information Needed]
127
+ - **Cloud Provider:** [More Information Needed]
128
+ - **Compute Region:** [More Information Needed]
129
+ - **Carbon Emitted:** [More Information Needed]
130
+
131
+ ## Technical Specifications [optional]
132
+
133
+ ### Model Architecture and Objective
134
+
135
+ [More Information Needed]
136
+
137
+ ### Compute Infrastructure
138
+
139
+ [More Information Needed]
140
+
141
+ #### Hardware
142
+
143
+ [More Information Needed]
144
+
145
+ #### Software
146
+
147
+ [More Information Needed]
148
+
149
+ ## Model Card Authors [optional]
150
+
151
+ [More Information Needed]
152
+
153
+ ## Model Card Contact
154
+
155
+ [More Information Needed]