File size: 2,530 Bytes
1478a38 0daf370 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 55c5a1b 1478a38 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for Model ID
<!-- Model is a fine-tuned version of BERT_base_uncased on an augmented and cleaned version of the city of Toronto's waste wizard lookup table open to developers. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Declan Bracken, Armando Ordorica, Michael Santorelli, Paul Zhou
- **Model type:** Transformer
- **Language(s) (NLP):** English
- **Finetuned from model:** BERT_base_uncased
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
Create a custom class to load in the model, the label encoder, and the BERT tokenizer used for training (bert-base-uncased) as below.
use the tokenizer to tokenize any input string you'd like, then pass it through the model to get outputs.
class BERTClassifier:
def __init__(self, model_identifier):
# Load the tokenizer from bert base uncased
self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Load the config
config = AutoConfig.from_pretrained(model_identifier)
# Load the model
self.model = BertForSequenceClassification.from_pretrained(model_identifier, config=config)
self.model.eval() # Set the model to evaluation mode
# Load the label encoder
encoder_url = f'https://huggingface.co/{model_identifier}/resolve/main/model_encoder.pkl'
self.labels = pickle.loads(requests.get(encoder_url).content)
def predict_category(self, text):
# Tokenize the text
inputs = self.tokenizer(text, return_tensors='pt', truncation=True, padding=True)
# Predict
with torch.no_grad():
outputs = self.model(**inputs)
# Get the prediction index
prediction_idx = torch.argmax(outputs.logits, dim=1).item()
# Decode the prediction index to get the label
prediction_label = self.labels[prediction_idx] # Use indexing for a NumPy array
return prediction_label
|