Mollel
/

ASR-Swahili-Small

@@ -1,89 +1,35 @@
 ---
-library_name: transformers
-license: apache-2.0
 base_model: openai/whisper-small
-tags:
-- generated_from_trainer
 datasets:
-- common_voice_17_0
-metrics:
-- wer
 model-index:
-- name: ASR-Swahili-Small
   results:
   - task:
-      name: Automatic Speech Recognition
       type: automatic-speech-recognition
     dataset:
-      name: common_voice_17_0
-      type: common_voice_17_0
-      config: sw
-      split: test
-      args: sw
     metrics:
-    - name: Wer
-      type: wer
-      value: 43.87610007301795
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# ASR-Swahili-Small
-This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the common_voice_17_0 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6535
-- Model Preparation Time: 0.0032
-- Wer: 43.8761
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 32
-- eval_batch_size: 16
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 50
-- num_epochs: 1
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Model Preparation Time | Wer     |
-|:-------------:|:------:|:----:|:---------------:|:----------------------:|:-------:|
-| 1.5279        | 0.1103 | 50   | 1.1295          | 0.0032                 | 64.1770 |
-| 0.8155        | 0.2206 | 100  | 0.8755          | 0.0032                 | 56.6888 |
-| 0.6529        | 0.3309 | 150  | 0.7871          | 0.0032                 | 49.3640 |
-| 0.5837        | 0.4413 | 200  | 0.7383          | 0.0032                 | 47.9315 |
-| 0.5479        | 0.5516 | 250  | 0.7044          | 0.0032                 | 46.3078 |
-| 0.5195        | 0.6619 | 300  | 0.6823          | 0.0032                 | 45.4835 |
-| 0.505         | 0.7722 | 350  | 0.6674          | 0.0032                 | 44.5285 |
-| 0.4985        | 0.8825 | 400  | 0.6570          | 0.0032                 | 43.8828 |
-| 0.4841        | 0.9928 | 450  | 0.6535          | 0.0032                 | 43.8761 |
-### Framework versions
-- Transformers 4.49.0
-- Pytorch 2.6.0+cu124
-- Datasets 3.3.1
-- Tokenizers 0.21.0

 ---
 base_model: openai/whisper-small
 datasets:
+- mozilla-foundation/common_voice_17_0
+language: sw
+library_name: transformers
+license: apache-2.0
 model-index:
+- name: Finetuned openai/whisper-small on Swahili
   results:
   - task:
       type: automatic-speech-recognition
+      name: Speech-to-Text
     dataset:
+      name: Common Voice (Swahili)
+      type: common_voice
     metrics:
+    - type: wer
+      value: 43.876
 ---
+# Finetuned openai/whisper-small on 58000 Swahili training audio samples from mozilla-foundation/common_voice_17_0.
+This model was created from the Mozilla.ai Blueprint:
+[speech-to-text-finetune](https://github.com/mozilla-ai/speech-to-text-finetune).
+## Evaluation results on 12253 audio samples of Swahili:
+### Baseline model (before finetuning) on Swahili
+- Word Error Rate: 133.795
+- Loss: 2.459
+### Finetuned model (after finetuning) on Swahili
+- Word Error Rate: 43.876
+- Loss: 0.653