---
library_name: transformers
license: apache-2.0
base_model: bert-base-uncased
tags:
- generated_from_trainer
model-index:
- name: bert-philosophy-adapted
  results: []
datasets:
- AiresPucrs/stanford-encyclopedia-philosophy
language:
- en
pipeline_tag: fill-mask
---

# bert-philosophy-adapted

This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [Standford Encylcopedia of Philosophy](https://huggingface.co/datasets/AiresPucrs/stanford-encyclopedia-philosophy) dataset, using masked language modeling.
It achieves the following results on the evaluation set:
- Loss: 1.5044

## Model description

This model was trained with the intention of creating a BERT encoder model for philosophical terminology, and further training on downstream tasks such as school of philosophy text classification.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step  | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 2.0568        | 0.1020 | 500   | 1.8821          |
| 1.9169        | 0.2039 | 1000  | 1.7939          |
| 1.873         | 0.3059 | 1500  | 1.7593          |
| 1.8408        | 0.4078 | 2000  | 1.7280          |
| 1.8461        | 0.5098 | 2500  | 1.7069          |
| 1.8108        | 0.6117 | 3000  | 1.6899          |
| 1.7959        | 0.7137 | 3500  | 1.6748          |
| 1.7771        | 0.8157 | 4000  | 1.6490          |
| 1.7705        | 0.9176 | 4500  | 1.6371          |
| 1.725         | 1.0196 | 5000  | 1.6317          |
| 1.707         | 1.1215 | 5500  | 1.6279          |
| 1.7127        | 1.2235 | 6000  | 1.6100          |
| 1.6806        | 1.3254 | 6500  | 1.5978          |
| 1.6809        | 1.4274 | 7000  | 1.5920          |
| 1.6766        | 1.5294 | 7500  | 1.5831          |
| 1.6598        | 1.6313 | 8000  | 1.5748          |
| 1.6632        | 1.7333 | 8500  | 1.5646          |
| 1.6433        | 1.8352 | 9000  | 1.5554          |
| 1.6317        | 1.9372 | 9500  | 1.5552          |
| 1.6141        | 2.0392 | 10000 | 1.5404          |
| 1.6328        | 2.1411 | 10500 | 1.5393          |
| 1.5981        | 2.2431 | 11000 | 1.5330          |
| 1.6192        | 2.3450 | 11500 | 1.5260          |
| 1.6051        | 2.4470 | 12000 | 1.5198          |
| 1.6218        | 2.5489 | 12500 | 1.5162          |
| 1.5721        | 2.6509 | 13000 | 1.5079          |
| 1.5656        | 2.7529 | 13500 | 1.5109          |
| 1.5642        | 2.8548 | 14000 | 1.5077          |
| 1.5715        | 2.9568 | 14500 | 1.5106          |


### Framework versions

- Transformers 4.52.4
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1