kanyekuthi
/

dsn_afrispeech

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Metrics Training metrics Community

dsn_afrispeech / README.md

kanyekuthi's picture

Update README.md

21e5e2d over 1 year ago

|

history blame contribute delete

2.53 kB

	---
	license: mit
	datasets:
	- tobiolatunji/afrispeech-200
	language:
	- en
	metrics:
	- wer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	finetuned_from: openai/whisper-small
	tasks: automatic-speech-recognition
	tags:
	- audio
	- automatic-speech-recognition
	- hf-asr-leaderboard
	---
	# Whisper Small Model Card

	<!-- Provide a quick summary of what the model is/does. -->

	Whisper Small is a pre-trained model for automatic speech recognition (ASR) and speech translation.
	It is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
	It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision.
	The model has 244 million parameters and is multilingual

	##### Performance
	Whisper Small has a high accuracy and can generalize well to many datasets and domains without the need for fine-tuning.

	##### Usage
	To transcribe audio samples, the model has to be used alongside a WhisperProcessor.
	The WhisperProcessor is used to pre-process the audio inputs (converting them to log-Mel spectrograms for the model)
	and post-process the model outputs (converting them from tokens to text).

	##### References
	- ** https://huggingface.co/openai/whisper-small
	- ** https://github.com/openai/whisper
	- ** https://openai.com/research/whisper
	- ** https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/


	## Model Details
	Whisper is a transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.
	It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using Whisper large-v2.

	The models were trained on either English-only data or multilingual data.
	The English-only models were trained on the task of speech recognition.
	The multilingual models were trained on both speech recognition and speech translation.
	For speech recognition, the model predicts transcriptions in the same language as the audio.
	For speech translation, the model predicts transcriptions to a different language to the audio.


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	- Transcription
	- Translation


	## Training hyperparameters
	<!-- Relevant interpretability work for the model goes here -->
	- learning_rate: 1e-5
	- train_batch_size: 8
	- eval_batch_size: 8
	- lr_scheduler_warmup_steps: 500
	- max_steps: 4000
	- metric_for_best_model: wer