Automatic Speech Recognition
Transformers
Catalan
audio
whisper-large-v3
barcelona-supercomputing-center
File size: 5,178 Bytes
56ca390
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71b14af
56ca390
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71b14af
56ca390
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
language:
- ca
datasets:
- projecte-aina/3catparla_asr
- projecte-aina/corts_valencianes_asr_a
tags:
- audio
- automatic-speech-recognition
- whisper-large-v3
- barcelona-supercomputing-center
license: apache-2.0
 
library_name: transformers
base_model:
- openai/whisper-large-v3

---
# faster-whisper-3cat-cv21-valencian

## Table of Contents
<details>
<summary>Click to expand</summary>

- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Conversion Details](#conversion-details)
- [Citation](#citation)
- [Additional Information](#additional-information)

</details>


## Model Description

The "BSC-LT/faster-whisper-3cat-cv21-valencian" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian)

## Intended Uses and Limitations
This model is the result of converting the [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian) into a lighter model using a Python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).
The model can be used for Automatic Speech Recognition (ASR) in Catalan, especially in the Valencian accent. The model intends to transcribe Catalan audio files to plain text without punctuation.

<!--
## How to Get Started with the Model

To see an updated and functional version of this code, please visit our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
-->
### Installation

To use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) 

Create a virtual environment:
```bash
python -m venv /path/to/venv
```
Activate the environment:
```bash
source /path/to/venv/bin/activate
```
Install the modules:
```bash
pip install faster-whisper
```

### For Inference
To transcribe audio in Catalan using this model, you can follow this example:

```python
from faster_whisper import WhisperModel

model_size = "BSC-LT/faster-whisper-3cat-cv21-valencian"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
#model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```

## Conversion Details

### Conversion procedure

This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:

```bash
ct2-transformers-converter --model BSC-LT/whisper-3cat-cv21-valencian
   --output_dir faster-whisper-3cat-cv21-valencian
   --copy_files preprocessor_config.json 
   --quantization float16
```
  
## Citation

If this model contributes to your research, please cite the work:
<!--
```bibtex
@inproceedings{hernandez20243catparla,
  title={3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition},
  author={Hern{\'a}ndez Mena, Carlos Daniel and Armentano Oller, Carme and Solito, Sarah and K{\"u}lebi, Baybars},
  booktitle={Proc. IberSPEECH 2024},
  pages={176--180},
  year={2024}
}
```
-->
```bibtext
@misc{BSC2025-fasterwhisper3catcv21valencian,
      title={Recognition models for adaptation to Catalan variants}, 
      author={Hernandez Mena, Carlos Daniel; Messaoudi, Abir; Armentaro Carme; España i Bonet, Cristina;},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/BSC-LT/faster-whisper-3cat-cv21-valencian},
      year={2025}
}
```

## Additional Information

### Author

The conversion process was performed during June (2025) in the [Language Technologies Laboratory](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/).

### Contact
For further information, please email <[email protected]>.

### Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

### License

[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The conversion of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.

We acknowledge EuroHPC Joint Undertaking for awarding us access to MareNostrum5 as BSC, Spain.