File size: 1,814 Bytes
3f56d20
15de6e8
 
1c07c81
 
 
 
 
 
 
 
 
 
 
5f75d6c
1c07c81
 
5f75d6c
 
 
c58e1c8
 
764b020
 
4d464a0
764b020
1c07c81
 
3f56d20
 
 
15de6e8
3f56d20
1c07c81
 
 
 
 
3f56d20
15de6e8
3f56d20
1038adc
3f56d20
15de6e8
3f56d20
c58e1c8
3f56d20
15de6e8
3f56d20
c58e1c8
3f56d20
15de6e8
3f56d20
15de6e8
3f56d20
15de6e8
b9c0e02
15de6e8
 
 
 
 
 
 
 
b9c0e02
15de6e8
3f56d20
15de6e8
3f56d20
15de6e8
 
 
c58e1c8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
model-index:
- name: mHuBERT-147-br
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: common_voice_15_0
      type: common_voice_15_0
      config: br
      split: test
      args: br
    metrics:
    - name: WER
      type: wer
      value: 47.0
    - name: CER
      type: cer
      value: 16.7
language:
- br
metrics:
- wer
base_model: utter-project/mHuBERT-147
pipeline_tag: automatic-speech-recognition
datasets:
- mozilla-foundation/common_voice_15_0
---


# mHuBERT-147-br

This model is a fine-tuned version of [utter-project/mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) on Mozilla Common Voice 15 Breton dataset and [Roadennoù](https://github.com/gweltou/roadennou) dataset.
It achieves the following results on the validation set:
- Loss: 0.7331
- Wer: 50.09
- Cer: 16.45

## Model description

This model was trained to assess the performance of mHubert-147 for finetuning a Breton ASR model. 

## Intended uses & limitations

This model is a research model and shouldn't be used in production.

## Training and evaluation data

90% of the Roadennoù dataset was used for training, the remaining 10% was used for validation in addition to MCV15-br validation dataset.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3.8e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 52
- mixed_precision_training: Native AMP

### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.1+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2