File size: 2,530 Bytes
1478a38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0daf370
55c5a1b
 
 
1478a38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55c5a1b
 
1478a38
55c5a1b
 
 
 
1478a38
55c5a1b
 
1478a38
55c5a1b
 
 
1478a38
55c5a1b
 
 
1478a38
55c5a1b
 
 
1478a38
55c5a1b
 
 
1478a38
55c5a1b
 
1478a38
55c5a1b
 
1478a38
55c5a1b
1478a38
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---

# Model Card for Model ID

<!-- Model is a fine-tuned version of BERT_base_uncased on an augmented and cleaned version of the city of Toronto's waste wizard lookup table open to developers. -->

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** Declan Bracken, Armando Ordorica, Michael Santorelli, Paul Zhou
- **Model type:** Transformer
- **Language(s) (NLP):** English
- **Finetuned from model:** BERT_base_uncased

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

Create a custom class to load in the model, the label encoder, and the BERT tokenizer used for training (bert-base-uncased) as below.
use the tokenizer to tokenize any input string you'd like, then pass it through the model to get outputs.

class BERTClassifier:
    def __init__(self, model_identifier):
        # Load the tokenizer from bert base uncased
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

        # Load the config
        config = AutoConfig.from_pretrained(model_identifier)

        # Load the model
        self.model = BertForSequenceClassification.from_pretrained(model_identifier, config=config)
        self.model.eval()  # Set the model to evaluation mode

        # Load the label encoder
        encoder_url = f'https://huggingface.co/{model_identifier}/resolve/main/model_encoder.pkl'
        self.labels = pickle.loads(requests.get(encoder_url).content)

    def predict_category(self, text):
        # Tokenize the text
        inputs = self.tokenizer(text, return_tensors='pt', truncation=True, padding=True)

        # Predict
        with torch.no_grad():
            outputs = self.model(**inputs)

        # Get the prediction index
        prediction_idx = torch.argmax(outputs.logits, dim=1).item()

        # Decode the prediction index to get the label
        prediction_label = self.labels[prediction_idx]  # Use indexing for a NumPy array

        return prediction_label