---
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: MiniLMv2-L6-H384-distilled-from-RoBERTa-Large-finetuned-wikitext103-mlm-multi-emails-hq-x2bs
  results: []
datasets:
  - postbot/multi-emails-hq
language:
  - en
pipeline_tag: fill-mask
widget:
  - text: Can you please send me the <mask> by the end of the day?
    example_title: end of day
  - text: >-
      I hope this email finds you well. I wanted to follow up on our <mask>
      yesterday.
    example_title: follow-up
  - text: The meeting has been rescheduled to <mask>.
    example_title: reschedule
  - text: Please let me know if you need any further <mask> regarding the project.
    example_title: further help
  - text: >-
      I appreciate your prompt response to my previous email. Can you provide an
      update on the <mask> by tomorrow?
    example_title: provide update
  - text: Paris is the <mask> of France.
    example_title: paris (default)
  - text: The goal of life is <mask>.
    example_title: goal of life (default)
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# MiniLMv2-L6-H384-distilled-from-RoBERTa-Large-finetuned-wikitext103-mlm-multi-emails-hq-x2bs

This model is a fine-tuned version of [saghar/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large-finetuned-wikitext103](https://huggingface.co/saghar/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large-finetuned-wikitext103) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0371
- Accuracy: 0.6450

## Model description


- masked language model
- mini version of RoBERTa
- does support uppercase text

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 16
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 16.0

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 3.2947        | 1.0   | 308  | 3.0832          | 0.5122   |
| 2.8727        | 2.0   | 616  | 2.6722          | 0.5662   |
| 2.6339        | 3.0   | 924  | 2.4797          | 0.5878   |
| 2.5053        | 4.0   | 1232 | 2.3833          | 0.6025   |
| 2.4531        | 5.0   | 1540 | 2.3085          | 0.6106   |
| 2.2852        | 6.0   | 1848 | 2.2451          | 0.6175   |
| 2.228         | 7.0   | 2156 | 2.1937          | 0.6244   |
| 2.2013        | 8.0   | 2464 | 2.1446          | 0.6310   |
| 2.1463        | 9.0   | 2772 | 2.1062          | 0.6357   |
| 2.0882        | 10.0  | 3080 | 2.0847          | 0.6370   |
| 2.1669        | 11.0  | 3388 | 2.0687          | 0.6399   |
| 2.0983        | 12.0  | 3696 | 2.0629          | 0.6423   |
| 2.1215        | 13.0  | 4004 | 2.0259          | 0.6476   |
| 2.1255        | 14.0  | 4312 | 2.0378          | 0.6461   |
| 2.1751        | 15.0  | 4620 | 2.0257          | 0.6458   |
| 1.9516        | 16.0  | 4928 | 2.0371          | 0.6450   |


### Framework versions

- Transformers 4.27.0.dev0
- Pytorch 2.0.0.dev20230212+cu118
- Datasets 2.9.0
- Tokenizers 0.13.2