File size: 5,178 Bytes
56ca390 71b14af 56ca390 71b14af 56ca390 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
language:
- ca
datasets:
- projecte-aina/3catparla_asr
- projecte-aina/corts_valencianes_asr_a
tags:
- audio
- automatic-speech-recognition
- whisper-large-v3
- barcelona-supercomputing-center
license: apache-2.0
library_name: transformers
base_model:
- openai/whisper-large-v3
---
# faster-whisper-3cat-cv21-valencian
## Table of Contents
<details>
<summary>Click to expand</summary>
- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Conversion Details](#conversion-details)
- [Citation](#citation)
- [Additional Information](#additional-information)
</details>
## Model Description
The "BSC-LT/faster-whisper-3cat-cv21-valencian" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian)
## Intended Uses and Limitations
This model is the result of converting the [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian) into a lighter model using a Python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).
The model can be used for Automatic Speech Recognition (ASR) in Catalan, especially in the Valencian accent. The model intends to transcribe Catalan audio files to plain text without punctuation.
<!--
## How to Get Started with the Model
To see an updated and functional version of this code, please visit our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
-->
### Installation
To use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master)
Create a virtual environment:
```bash
python -m venv /path/to/venv
```
Activate the environment:
```bash
source /path/to/venv/bin/activate
```
Install the modules:
```bash
pip install faster-whisper
```
### For Inference
To transcribe audio in Catalan using this model, you can follow this example:
```python
from faster_whisper import WhisperModel
model_size = "BSC-LT/faster-whisper-3cat-cv21-valencian"
# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
## Conversion Details
### Conversion procedure
This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:
```bash
ct2-transformers-converter --model BSC-LT/whisper-3cat-cv21-valencian
--output_dir faster-whisper-3cat-cv21-valencian
--copy_files preprocessor_config.json
--quantization float16
```
## Citation
If this model contributes to your research, please cite the work:
<!--
```bibtex
@inproceedings{hernandez20243catparla,
title={3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition},
author={Hern{\'a}ndez Mena, Carlos Daniel and Armentano Oller, Carme and Solito, Sarah and K{\"u}lebi, Baybars},
booktitle={Proc. IberSPEECH 2024},
pages={176--180},
year={2024}
}
```
-->
```bibtext
@misc{BSC2025-fasterwhisper3catcv21valencian,
title={Recognition models for adaptation to Catalan variants},
author={Hernandez Mena, Carlos Daniel; Messaoudi, Abir; Armentaro Carme; España i Bonet, Cristina;},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/BSC-LT/faster-whisper-3cat-cv21-valencian},
year={2025}
}
```
## Additional Information
### Author
The conversion process was performed during June (2025) in the [Language Technologies Laboratory](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/).
### Contact
For further information, please email <[email protected]>.
### Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
### License
[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
### Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.
The conversion of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
We acknowledge EuroHPC Joint Undertaking for awarding us access to MareNostrum5 as BSC, Spain. |