visalkao
/

whisper-small-french-finetuning

Model card Files Files and versions Community

whisper-small-french-finetuning / README.md

visalkao's picture

Update README.md

b2d1bc5 verified about 2 months ago

|

history blame contribute delete

2.91 kB

	---
	base_model: openai/whisper-small
	library_name: peft
	license: mit
	tags:
	- whisper-small
	- speech_to_text
	- ASR
	- french
	language:
	- fr
	demo: https://huggingface.co/spaces/visalkao/whisper-small-french-finetuned
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: Visal KAO
	- Model type: Speech Recognition
	- Language(s) (NLP): French
	- License: MIT
	- Finetuned from model : Whisper-small

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: openai/whisper-small
	## Dataset
	This model is finetuned on 50% of French Single Speaker Speech Dataset on kaggle (Only lesmis).
	- Link to dataset : (https://www.kaggle.com/datasets/bryanpark/french-single-speaker-speech-dataset)

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	The goal of this project is to finetune whisper-small model to improve its accuracy for french transcription.

	The reason why I chose Whisper-small is due to its size and versatility. My primary objective is to build/finetune a small model to get acceptable results.
	### Direct Use

	Live Demo : https://huggingface.co/spaces/visalkao/whisper-small-french-finetuned

	## Bias, Risks, and Limitations

	As this model has less than 250 millions parameters, which is quite small considering its objective is to transcribe speech, it also has its own limitation.

	The Word Error Rate (WER) of this finetuned model is approximately 0.17 (17%).

	For reference, the original Whisper-small's WER is around 0.27 (27%) on the same dataset.




	## Training Hyperparameters
	This model is trained using LoRa with these hyperparamters:

	* per_device_train_batch_size=3,
	* gradient_accumulation_steps=1,
	* learning_rate=1e-3,
	* num_train_epochs=7,
	* evaluation_strategy="epoch",
	* fp16=True,
	* per_device_eval_batch_size=1,
	* generation_max_length=225,
	* logging_steps=10,
	* remove_unused_columns=False,
	* label_names=["labels"],
	* predict_with_generate=True,


	## Results

	Before finetuning, The Word Error Rate on this dataset is approximately 0.27.

	After finetuning, it drops down 0.1 to 0.17 or 17% wer (On testing data).

	Here is the training log:

	\| Epoch \| Training Loss \| Validation Loss \| WER \|
	\|-------\|--------------\|----------------\|------------\|
	\| 1 \| 0.369600 \| 0.404414 \| 26.665379 \|
	\| 2 \| 0.273200 \| 0.361762 \| 22.793976 \|
	\| 3 \| 0.308800 \| 0.344289 \| 24.454528 \|
	\| 4 \| 0.131600 \| 0.318023 \| 21.847847 \|
	\| 5 \| 0.117400 \| 0.311023 \| 19.134968 \|
	\| 6 \| 0.035700 \| 0.301410 \| 18.922572 \|
	\| 7 \| 0.013900 \| 0.315151 \| 16.972388 \|