File size: 5,059 Bytes

---
library_name: transformers
license: cc-by-nc-4.0
base_model: facebook/mms-1b-all
tags:
- generated_from_trainer
metrics:
- wer
- bleu
- rouge
model-index:
- name: ardzdirect
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# ardzdirect

This model is a fine-tuned version of [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2738
- Wer: 0.4396
- Bleu: 0.3030
- Rouge: {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0}

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 20
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch   | Step | Validation Loss | Wer    | Bleu   | Rouge                                                           |
|:-------------:|:-------:|:----:|:---------------:|:------:|:------:|:---------------------------------------------------------------:|
| 2.9801        | 0.8316  | 100  | 0.4689          | 0.6744 | 0.0952 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.5487        | 1.6570  | 200  | 0.3861          | 0.6112 | 0.1243 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4999        | 2.4823  | 300  | 0.3645          | 0.5976 | 0.1358 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.482         | 3.3077  | 400  | 0.3483          | 0.5838 | 0.1788 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4421        | 4.1331  | 500  | 0.3513          | 0.5801 | 0.1489 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4465        | 4.9647  | 600  | 0.3361          | 0.5699 | 0.1562 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4425        | 5.7900  | 700  | 0.3482          | 0.5886 | 0.1368 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4049        | 6.6154  | 800  | 0.3304          | 0.5537 | 0.1708 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4011        | 7.4407  | 900  | 0.3155          | 0.5451 | 0.2087 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.4033        | 8.2661  | 1000 | 0.3319          | 0.5522 | 0.1835 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3804        | 9.0915  | 1100 | 0.2983          | 0.5172 | 0.2160 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3672        | 9.9231  | 1200 | 0.2957          | 0.5170 | 0.2282 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3661        | 10.7484 | 1300 | 0.2999          | 0.5152 | 0.2346 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3599        | 11.5738 | 1400 | 0.2952          | 0.4998 | 0.2372 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3475        | 12.3992 | 1500 | 0.3041          | 0.4961 | 0.2493 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3387        | 13.2245 | 1600 | 0.2971          | 0.5205 | 0.2432 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3474        | 14.0499 | 1700 | 0.2948          | 0.4808 | 0.2616 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3238        | 14.8815 | 1800 | 0.2809          | 0.4860 | 0.2669 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3156        | 15.7069 | 1900 | 0.2740          | 0.4544 | 0.2857 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3185        | 16.5322 | 2000 | 0.2736          | 0.4627 | 0.2799 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3148        | 17.3576 | 2100 | 0.2793          | 0.4455 | 0.2959 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.2996        | 18.1830 | 2200 | 0.2705          | 0.4451 | 0.2943 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.3112        | 19.0083 | 2300 | 0.2708          | 0.4440 | 0.2986 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |
| 0.2916        | 19.8399 | 2400 | 0.2738          | 0.4396 | 0.3030 | {'rouge1': 0.0, 'rouge2': 0.0, 'rougeL': 0.0, 'rougeLsum': 0.0} |


### Framework versions

- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0