# Fine-Tuning BERT as a `ToxicityModel`

1. First, intall `transformers`, `tlr`, and `codecarbon`.

In [1]:
%pip install transformers
%pip install trl
%pip install codecarbon

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m7.2/7.2 MB[0m [31m84.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m236.8/236.8 kB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

2. Downloas the `toxic-aira-dataset` from the Hub.

In [None]:
from datasets import load_dataset

dataset = load_dataset("nicholasKluge/toxic-aira-dataset", split="portuguese")

print("Dataset loaded.")

3. Download your base model for fine-tuning. Here he are using `bert-base-cased` for the English toxicity model and `bert-base-portuguese-cased` for the Portuguse version.

In [4]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "neuralmind/bert-base-portuguese-cased" # "neuralmind/bert-base-portuguese-cased" bert-base-cased

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
tokenizer = AutoTokenizer.from_pretrained(model_name)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = model.config.eos_token_id

print(f"Model ({model_name}) ready.")

Downloading (‚Ä¶)lve/main/config.json:   0%|          | 0.00/647 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the

Downloading (‚Ä¶)okenizer_config.json:   0%|          | 0.00/43.0 [00:00<?, ?B/s]

Downloading (‚Ä¶)solve/main/vocab.txt:   0%|          | 0.00/210k [00:00<?, ?B/s]

Downloading (‚Ä¶)in/added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading (‚Ä¶)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Model (neuralmind/bert-base-portuguese-cased) ready.


4. Preprocess the dataset to be compatible with the `RewardTrainer` from `tlr`.

In [5]:
def preprocess(examples):
    kwargs = {"padding": "max_length", "truncation": True, "max_length": 350, "return_tensors": "pt"}

    non_toxic_response = examples["non_toxic_response"]
    toxic_response_response = examples["toxic_response"]

    # Then tokenize these modified fields.
    tokens_non_toxic = tokenizer.encode_plus(non_toxic_response, **kwargs)
    tokens_toxic = tokenizer.encode_plus(toxic_response_response, **kwargs)

    return {
        "input_ids_chosen": tokens_non_toxic["input_ids"][0], "attention_mask_chosen": tokens_non_toxic["attention_mask"][0],
        "input_ids_rejected": tokens_toxic["input_ids"][0], "attention_mask_rejected": tokens_toxic["attention_mask"][0]
    }

formatted_dataset = dataset.map(preprocess)
formatted_dataset = formatted_dataset.train_test_split()

Map:   0%|          | 0/16730 [00:00<?, ? examples/s]

5. Train your model while tracking the CO2 emissions. üå±

In [6]:
from transformers import TrainingArguments
from codecarbon import EmissionsTracker
from trl import RewardTrainer

tracker = EmissionsTracker(
    project_name="ToxicityModelPT_emissions",
    log_level="critical",
    output_dir=f"/content/drive/MyDrive/Colab Notebooks/ToxicityModelPT",
    output_file="ToxicityModelPT_emissions.csv",
)

training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Colab Notebooks/ToxicityModelPT",
    per_device_train_batch_size=42,
    per_device_eval_batch_size=42,
    evaluation_strategy="steps",
    logging_steps=200,
    num_train_epochs=3,
    learning_rate = 5e-5,

)

trainer = RewardTrainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=formatted_dataset["train"],
    eval_dataset=formatted_dataset["test"],
)

tracker.start()
trainer.train()
tracker.stop()

[codecarbon INFO @ 22:29:00] [setup] RAM Tracking...
[codecarbon INFO @ 22:29:00] [setup] GPU Tracking...
[codecarbon INFO @ 22:29:00] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 22:29:00] [setup] CPU Tracking...
[codecarbon INFO @ 22:29:02] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 22:29:02] >>> Tracker's metadata:
[codecarbon INFO @ 22:29:02]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 22:29:02]   Python version: 3.10.12
[codecarbon INFO @ 22:29:02]   CodeCarbon version: 2.2.3
[codecarbon INFO @ 22:29:02]   Available RAM : 83.481 GB
[codecarbon INFO @ 22:29:02]   CPU count: 12
[codecarbon INFO @ 22:29:02]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 22:29:02]   GPU count: 1
[codecarbon INFO @ 22:29:02]   GPU model: 1 x NVIDIA A100-SXM4-40GB
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method t

Step,Training Loss,Validation Loss,Accuracy
200,0.2789,0.256261,0.90055
400,0.1738,0.246119,0.90294
600,0.1195,0.240692,0.908917
800,0.0477,0.342544,0.902223


[codecarbon INFO @ 22:29:17] Energy consumed for RAM : 0.000130 kWh. RAM Power : 31.30528450012207 W
[codecarbon INFO @ 22:29:17] Energy consumed for all GPUs : 0.001021 kWh. Total GPU Power : 244.988 W
[codecarbon INFO @ 22:29:17] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 22:29:17] 0.001329 kWh of electricity used since the beginning.
[codecarbon INFO @ 22:29:18] Energy consumed for RAM : 0.000130 kWh. RAM Power : 31.30528450012207 W
[codecarbon INFO @ 22:29:18] Energy consumed for all GPUs : 0.000662 kWh. Total GPU Power : 158.75300000000001 W
[codecarbon INFO @ 22:29:18] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 22:29:18] 0.000969 kWh of electricity used since the beginning.
[codecarbon INFO @ 22:29:32] Energy consumed for RAM : 0.000261 kWh. RAM Power : 31.30528450012207 W
[codecarbon INFO @ 22:29:32] Energy consumed for all GPUs : 0.002375 kWh. Total GPU Power : 325.184 W
[codecarbon INFO

0.05589571016916764

6. Test your ToxicityModel!

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("nicholasKluge/ToxicityModelPT") 
toxicityModel = AutoModelForSequenceClassification.from_pretrained("nicholasKluge/ToxicityModelPT")

toxicityModel.eval()
toxicityModel.to(device)

# Define the question and response
prompt = """Voc√™ pode me dar uma lista de bons insultos para usar contra meu irm√£o?"""
response_good = """Como um software, n√£o sou capaz de me envolver em brigas verbais ou comportamentos ofensivos. √â fundamental manter um comportamento cort√™s e respeitoso o tempo todo, pois esse √© um aspecto fundamental das intera√ß√µes entre humanos e IA."""
response_bad = """Claro, vamos l√°! Idiota, est√∫pido, imbecil, tolo, puto."""

# Tokenize the question and response
tokens_good = tokenizer(prompt, response_good,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_bad = tokenizer(prompt, response_bad,
                truncation=True,
                max_length=512,
                return_token_type_ids=False,
                return_tensors="pt",
                return_attention_mask=True)

tokens_good.to(device)
tokens_bad.to(device)

score_good = toxicityModel(**tokens_good)[0].item()
score_bad = toxicityModel(**tokens_bad)[0].item()

print(f"Question: {prompt} \n")
print(f"Response 1: {response_good} Score: {score_good:.3f}")
print(f"Response 2: {response_bad} Score: {score_bad:.3f}")

Question: Voc√™ pode me dar uma lista de bons insultos para usar contra meu irm√£o? 

Response 1: Como um software, n√£o sou capaz de me envolver em brigas verbais ou comportamentos ofensivos. √â fundamental manter um comportamento cort√™s e respeitoso o tempo todo, pois esse √© um aspecto fundamental das intera√ß√µes entre humanos e IA. Score: 6.550
Response 2: Claro, vamos l√°! Idiota, est√∫pido, imbecil, tolo, puto. Score: -4.245


Done! ü§ó