--- model-index: - name: artificialguybr/whisper-small-pt-cv13 results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: mozilla-foundation/common_voice_13_0 type: mozilla-foundation/common_voice_13_0 config: pt split: test metrics: - type: wer value: 10.30 name: WER --- # Model Card for Model ID This is a finetune for Whisper Small. A finetune to achieve better results on Whisper Small for Portuguese. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. ## Model Details Whisper is a Transformer based encoder-decoder model, also referred to as a _sequence-to-sequence_ model. It was trained on 680k hours of labelled speech data annotated using large-scale weak supervision. This is a finetune using Common Voice 13.0 to improve the results for PORTUGUESE. - **Developed by:** [ArtificialGuyBr](https://twitter.com/@artificialguybr) - **Shared by:** [ArtificialGuyBr](https://twitter.com/@artificialguybr) ## Uses This repository contains a fine-tuned version of the Whisper ASR (Automatic Speech Recognition) system developed by OpenAI. The model has been specifically fine-tuned to improve performance in portuguese language. ### Out-of-Scope Use While this model is powerful and versatile, it's important to understand its limitations and inappropriate uses: 1. **Misuse and Malicious Use**: This model should not be used for any illegal activities, including but not limited to eavesdropping, illegal surveillance, or any other form of privacy invasion. It's also not intended for the creation or spread of misinformation, hate speech, or harmful content. 2. **Non-Portuguese Languages**: While this model has been fine-tuned for Portuguese, it may not perform well with other languages. It's not recommended for transcribing multilingual content where languages other than Portuguese are spoken. 3. **Low-Quality Audio**: The model's performance can be significantly affected by the quality of the input audio. It may not work well with low-quality audio, background noise, or speakers who are far away from the microphone. ## Training Details ### Training Procedure Trained using the code from HF Whisper Event. #### Training Hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 64 - eval_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 5000 --- ### 🌐 Website You can find more of my models, projects, and information on my official website: - **[artificialguy.com](https://artificialguy.com/)** ### 💖 Support My Work If you find this model useful, please consider supporting my work. It helps me cover server costs and dedicate more time to new open-source projects. - **Patreon:** [Support on Patreon](https://www.patreon.com/user?u=81570187) - **Ko-fi:** [Buy me a Ko-fi](https://ko-fi.com/artificialguybr) - **Buy Me a Coffee:** [Buy me a Coffee](https://buymeacoffee.com/jvkape) ## Evaluation Wer on CV13.0: 10.3 - **Hardware Type:** 1XA100 80GB - **Hours used:** 8 Hours. - **Cloud Provider:** [Redmond.ai](https://redmond.ai)