output

This model is a fine-tuned version of neuralmind/bert-base-portuguese-cased on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6440

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 15.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.1985 0.22 2500 1.0940
1.0937 0.44 5000 1.0033
1.0675 0.66 7500 0.9753
1.0565 0.87 10000 0.9801
1.0244 1.09 12500 0.9526
0.9943 1.31 15000 0.9298
0.9799 1.53 17500 0.9035
0.95 1.75 20000 0.8835
0.933 1.97 22500 0.8636
0.9079 2.18 25000 0.8507
0.8938 2.4 27500 0.8397
0.8781 2.62 30000 0.8195
0.8647 2.84 32500 0.8088
0.8422 3.06 35000 0.7954
0.831 3.28 37500 0.7871
0.8173 3.5 40000 0.7721
0.8072 3.71 42500 0.7611
0.8011 3.93 45000 0.7532
0.7828 4.15 47500 0.7431
0.7691 4.37 50000 0.7367
0.7659 4.59 52500 0.7292
0.7606 4.81 55000 0.7245
0.8082 5.02 57500 0.7696
0.8114 5.24 60000 0.7695
0.8022 5.46 62500 0.7613
0.7986 5.68 65000 0.7558
0.8018 5.9 67500 0.7478
0.782 6.12 70000 0.7435
0.7743 6.34 72500 0.7367
0.774 6.55 75000 0.7313
0.7692 6.77 77500 0.7270
0.7604 6.99 80000 0.7200
0.7468 7.21 82500 0.7164
0.7486 7.43 85000 0.7117
0.7399 7.65 87500 0.7043
0.7306 7.86 90000 0.6956
0.7243 8.08 92500 0.6959
0.7132 8.3 95000 0.6916
0.71 8.52 97500 0.6853
0.7128 8.74 100000 0.6855
0.7088 8.96 102500 0.6809
0.7002 9.18 105000 0.6784
0.6953 9.39 107500 0.6737
0.695 9.61 110000 0.6714
0.6871 9.83 112500 0.6687
0.7161 10.05 115000 0.6961
0.7265 10.27 117500 0.7006
0.7284 10.49 120000 0.6941
0.724 10.7 122500 0.6887
0.7266 10.92 125000 0.6931
0.7051 11.14 127500 0.6846
0.7106 11.36 130000 0.6816
0.7011 11.58 132500 0.6830
0.6997 11.8 135000 0.6784
0.6969 12.02 137500 0.6734
0.6968 12.23 140000 0.6709
0.6867 12.45 142500 0.6656
0.6925 12.67 145000 0.6661
0.6795 12.89 147500 0.6606
0.6774 13.11 150000 0.6617
0.6756 13.33 152500 0.6563
0.6728 13.54 155000 0.6547
0.6732 13.76 157500 0.6520
0.6704 13.98 160000 0.6492
0.6666 14.2 162500 0.6446
0.6615 14.42 165000 0.6488
0.6638 14.64 167500 0.6523
0.6588 14.85 170000 0.6415

Framework versions

  • Transformers 4.12.5
  • Pytorch 1.10.1+cu113
  • Datasets 1.17.0
  • Tokenizers 0.10.3

Citing & Authors

If you use our work, please cite:

@incollection{Viegas_2023,
    doi = {10.1007/978-3-031-36805-9_24},
    url = {https://doi.org/10.1007%2F978-3-031-36805-9_24},
    year = 2023,
    publisher = {Springer Nature Switzerland},
    pages = {349--365},
    author = {Charles F. O. Viegas and Bruno C. Costa and Renato P. Ishii},
    title = {{JurisBERT}: A New Approach that~Converts a~Classification Corpus into~an~{STS} One},
    booktitle = {Computational Science and Its Applications {\textendash} {ICCSA} 2023}
}
Downloads last month
293
Safetensors
Model size
109M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alfaneo/bertimbaulaw-base-portuguese-cased

Finetuned
(99)
this model