File size: 8,149 Bytes
543c6fd
 
 
8cc50fb
543c6fd
 
 
 
 
ddc7a7f
1f82468
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4cb1899
 
543c6fd
 
e80b4ea
543c6fd
67ec59d
543c6fd
8cc50fb
 
 
543c6fd
 
 
5fa4761
 
 
 
 
 
 
 
 
 
543c6fd
1f82468
 
 
 
4cce5ad
1f82468
 
4cce5ad
e80b4ea
 
 
 
 
4cce5ad
1f82468
 
de203fe
e80b4ea
 
 
 
 
 
 
 
 
 
 
 
 
4cce5ad
1f82468
 
543c6fd
1f82468
543c6fd
e80b4ea
 
 
 
 
1f82468
 
543c6fd
 
1f82468
 
e80b4ea
 
 
 
1f82468
 
543c6fd
 
 
 
 
 
 
8cc50fb
 
543c6fd
bd00bcf
8cc50fb
bd00bcf
 
543c6fd
8cc50fb
543c6fd
 
 
 
bd00bcf
 
8cc50fb
 
 
 
 
 
 
543c6fd
 
 
 
 
 
 
4cb1899
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
library_name: transformers
license: mit
base_model: facebook/m2m100_1.2B
tags:
- generated_from_trainer
metrics:
- bleu
model-index:
- name: neutral_job_title_rephraser_pl
  results:
  - task:
      type: text2text-generation
      name: Gender-Neutral Job Title Rephrasing
    dataset:
      type: ArielUW/jobtitles
      name: Job Titles Dataset
      config: default
      split: test
    metrics:
    - type: bleu
      value: 93.9441
      name: BLEU
    - type: precision  
      value: 1.0
      name: Attempted Noun Neutralisation Precision
    - type: recall
      value: 0.892
      name: Attempted Noun Neutralisation Recall
    - type: levenshtein  
      value: 0.0395
      name: Normalized Levenshtein Distance (neutralization needed)
    - type: levenshtein  
      value: 0.0001
      name: Normalized Levenshtein Distance (neutralization not needed)
datasets:
- ArielUW/jobtitles
---

# neutral_job_title_rephraser_pl

This model is a fine-tuned version of [facebook/m2m100_1.2B](https://huggingface.co/facebook/m2m100_1.2B) on [ArielUW/jobtitles](https://huggingface.co/datasets/ArielUW/jobtitles) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7263
- Bleu: 93.9441
- Gen Len: 36.358

## Model description

The aim of this model is to provide gender-neutral terms for job titles in Polish in single sentences. The optimal outcome looks like this:<br>
  *Jestem pracownikiem tej firmy.*<br>
  turns into<br>
  *Jestem osobą pracowniczą tej firmy.*<br>
Sentences not containing such terms are not expected to change at all, for example:<br>
  *Mam uroczego kotka.*<br>
  turns into<br>
  *Mam uroczego kotka.*<br>

In terms of actual outcomes and errors in outputs, see our [readme](https://github.com/ArielUW/IMLLA-FinalProject/blob/main/README.md).

# Model usage

To use this model, you will need to install the transformers and sentencepiece libraries:

    !pip install transformers sentencepiece

You can then use the model directly through the pipeline API, which provides a high-level interface for text generation:

    from transformers import pipeline
    pipe = pipeline("text2text-generation", model="mongrz/model_output")
    gender_neutral_text = pipe("Pielęgniarki protestują pod sejmem.")
    print(gender_neutral_text)
    #expected output: [{'generated_text': 'Osoby pielęgniarskie protestują pod sejmem.'}] 

This will create a pipeline object for text-to-text generation using your model. You can then pass the input text to the pipe object to generate the gender-neutral version. The output will be a list of dictionaries, each containing the generated text.
Alternatively, you can still load the tokenizer and model manually for more fine-grained control:
    
    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("mongrz/model_output")
    model = AutoModelForSeq2SeqLM.from_pretrained("mongrz/model_output")
    
    text_to_translate = "Pielęgniarki protestują pod sejmem." 
    model_inputs = tokenizer(text_to_translate, return_tensors="pt")
    
    #Generate gender-neutral text
    gen_tokens = model.generate(**model_inputs, forced_bos_token_id=tokenizer.get_lang_id("pl"))
    
    #Decode and print the generated text 
    print(tokenizer.batch_decode(gen_tokens, skip_special_tokens=True))
    
This approach allows you to access the tokenizer and model directly and customize the generation process further if needed. Choose the method that best suits your needs.
  
## Intended uses & limitations
While this model demonstrates promising results in generating gender-neutral job titles in Polish, it has certain limitations:

  - Low-Frequency Items: The model may struggle with less common job titles or words that were not frequently present in the training data. It might produce inaccurate or unexpected outputs for such cases.
  - Morphosyntactically Complex Cases: Items requiring rare or non-typical patterns of forming personatives can pose challenges for the model. The accuracy of the generated output may decrease in such scenarios.
  - Feminine Nouns: The model has shown to sometimes underperform when dealing with feminine nouns, potentially due to biases or patterns in the training data. Further investigation and fine-tuning are needed to address this limitation.
  - Single Sentence Input: The model is optimized for single-sentence inputs and might not produce the desired results for single-word items, longer texts or paragraphs. It might fail to maintain context, coherence and terminological consistency across multiple sentences. Its performance for single-word items has not been tested.
  - Domain Specificity: The model is trained on a specific dataset of single sentences with job titles and without them. It may not generalize well to other domains or contexts. It might need further fine-tuning to adapt to different types of text or specific vocabulary.

    More information regarding issues, errors and limitations, see our [readme](https://github.com/ArielUW/IMLLA-FinalProject/blob/main/README.md).
## Training and evaluation data

This model was evaluated using several metrics to assess its performance:

  - BLEU (Bilingual Evaluation Understudy): BLEU is a widely used metric for evaluating machine translation quality. It measures the overlap between the generated text and the reference text in terms of n-grams. A higher BLEU score indicates better translation quality. The model achieved a BLEU score of 93.9441 on the evaluation set, indicating high accuracy in generating gender-neutral terms.
  - Attempted Noun Neutralisation Precision: This metric measures the proportion of correctly attempted neutralizations (i.e., items that required neutralization, not necessarily correctly formed neutral items) out of all attempted neutralizations. The model achieved a precision of 1, indicating that all attempted neutralizations were performed on items that required it.
  - Attempted Noun Neutralisation Recall: This metric measures the proportion of nouns that had a neutralization attempt present in the generated text out of all nouns that should have been neutralized. The model achieved a recall of 0.892, suggesting that it successfully recognized items requiring neutralization in the majority of cases.
  - Normalized Levenshtein's Distance: This metric calculates the edit distance between the generated text and the reference text, normalized by the length of the reference text. It provides a measure of similarity between the two texts. The model achieved a Levenshtein's distance of 0.0395 for sentences requiring neutralization and 0.0001 for the items that should not have been changed at all, indicating a high degree of similarity between the generated text and the reference text.

More information on the evaluation outcomes can be found in [our readme](https://github.com/ArielUW/IMLLA-FinalProject/blob/main/README.md).

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Use OptimizerNames.ADAFACTOR and the args are:
No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 7
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Bleu    | Gen Len |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|
| 23.051        | 1.0    | 38   | 4.3445          | 89.5045 | 35.746  |
| 15.9099       | 2.0    | 76   | 3.5044          | 91.9617 | 36.366  |
| 12.7846       | 3.0    | 114  | 2.8211          | 92.7676 | 36.22   |
| 10.3083       | 4.0    | 152  | 2.3006          | 93.675  | 36.284  |
| 8.4622        | 5.0    | 190  | 1.9316          | 93.6498 | 36.348  |
| 7.3015        | 6.0    | 228  | 1.7263          | 93.9441 | 36.358  |
| 6.8211        | 6.8212 | 259  | 1.6685          | 93.7274 | 36.306  |


### Framework versions

- Transformers 4.47.1
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0