File size: 992 Bytes
981c3cc
 
 
 
 
791822c
 
a585556
ce8e82b
 
2ba71c4
791822c
eaaf8a1
 
ce8e82b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
language:
- en
tags:
- automotive
---

WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually 
pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach.
WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts.
The dataset for continual pretraining consists of 1.8 million workshop feedback texts which contain ~4 million sentences. 
The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts.
We choose as the training architecture the BERT-base-uncased version.

Please contact Lukas Weber lukas-weber[at]hotmail[dot]de / lukas.l.weber[at]mercedes-benz[dot]com about any WG-BERT related issues and questions.