Automatic Speech Recognition
Transformers
Catalan
audio
whisper-large-v3
barcelona-supercomputing-center
AbirMessaoudi commited on
Commit
56ca390
·
verified ·
1 Parent(s): 8768a68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -3
README.md CHANGED
@@ -1,3 +1,147 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ca
4
+ datasets:
5
+ - projecte-aina/3catparla_asr
6
+ - projecte-aina/corts_valencianes_asr_a
7
+ tags:
8
+ - audio
9
+ - automatic-speech-recognition
10
+ - whisper-large-v3
11
+ - barcelona-supercomputing-center
12
+ license: apache-2.0
13
+
14
+ library_name: transformers
15
+ base_model:
16
+ - openai/whisper-large-v3
17
+
18
+ ---
19
+ # whisper-3cat-cv21-valencian
20
+
21
+ ## Table of Contents
22
+ <details>
23
+ <summary>Click to expand</summary>
24
+
25
+ - [Model Description](#model-description)
26
+ - [Intended Uses and Limitations](#intended-uses-and-limitations)
27
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
28
+ - [Conversion Details](#conversion-details)
29
+ - [Citation](#citation)
30
+ - [Additional Information](#additional-information)
31
+
32
+ </details>
33
+
34
+
35
+ ## Model Description
36
+
37
+ The "BSC-LT/faster-whisper-3cat-cv21-valencian" is an acoustic model based on a [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master) version of [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian)
38
+
39
+ ## Intended Uses and Limitations
40
+ This model is the result of converting the [BSC-LT/whisper-3cat-cv21-valencian](https://huggingface.co/langtech-veu/whisper-3cat-cv21-valencian) into a lighter model using a Python module called [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master).
41
+ The model can be used for Automatic Speech Recognition (ASR) in Catalan, especially in the Valencian accent. The model intends to transcribe Catalan audio files to plain text without punctuation.
42
+
43
+ <!--
44
+ ## How to Get Started with the Model
45
+
46
+ To see an updated and functional version of this code, please visit our [Notebook](https://colab.research.google.com/drive/1MHiPrffNTwiyWeUyMQvSdSbfkef_8aJC?usp=sharing)
47
+ -->
48
+ ### Installation
49
+
50
+ To use this model, you may install [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master)
51
+
52
+ Create a virtual environment:
53
+ ```bash
54
+ python -m venv /path/to/venv
55
+ ```
56
+ Activate the environment:
57
+ ```bash
58
+ source /path/to/venv/bin/activate
59
+ ```
60
+ Install the modules:
61
+ ```bash
62
+ pip install faster-whisper
63
+ ```
64
+
65
+ ### For Inference
66
+ To transcribe audio in Catalan using this model, you can follow this example:
67
+
68
+ ```python
69
+ from faster_whisper import WhisperModel
70
+
71
+ model_size = "BSC-LT/faster-whisper-3cat-cv21-valencian"
72
+
73
+ # Run on GPU with FP16
74
+ model = WhisperModel(model_size, device="cuda", compute_type="float16")
75
+
76
+ # or run on GPU with INT8
77
+ #model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
78
+ # or run on CPU with INT8
79
+ # model = WhisperModel(model_size, device="cpu", compute_type="int8")
80
+
81
+ segments, info = model.transcribe("audio_in_catalan.mp3", beam_size=5, task="transcribe",language="ca")
82
+
83
+ print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
84
+
85
+ for segment in segments:
86
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
87
+ ```
88
+
89
+ ## Conversion Details
90
+
91
+ ### Conversion procedure
92
+
93
+ This model is not a direct result of training. It is a conversion of a [Whisper](https://huggingface.co/openai/whisper-large-v3) model using [faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master). The procedure to create the model is as follows:
94
+
95
+ ```bash
96
+ ct2-transformers-converter --model BSC-LT/whisper-3cat-cv21-valencian
97
+ --output_dir faster-whisper-3cat-cv21-valencian
98
+ --copy_files preprocessor_config.json
99
+ --quantization float16
100
+ ```
101
+
102
+ ## Citation
103
+
104
+ If this model contributes to your research, please cite the work:
105
+ <!--
106
+ ```bibtex
107
+ @inproceedings{hernandez20243catparla,
108
+ title={3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition},
109
+ author={Hern{\'a}ndez Mena, Carlos Daniel and Armentano Oller, Carme and Solito, Sarah and K{\"u}lebi, Baybars},
110
+ booktitle={Proc. IberSPEECH 2024},
111
+ pages={176--180},
112
+ year={2024}
113
+ }
114
+ ```
115
+ -->
116
+ ```bibtext
117
+ @misc{BSC2025-whisper3catcv21valencian,
118
+ title={Recognition models for adaptation to Catalan variants},
119
+ author={Hernandez Mena, Carlos Daniel; Messaoudi, Abir; Armentaro Carme; España i Bonet, Cristina;},
120
+ organization={Barcelona Supercomputing Center},
121
+ url={https://huggingface.co/BSC-LT/faster-whisper-3cat-cv21-valencian},
122
+ year={2025}
123
+ }
124
+ ```
125
+
126
+ ## Additional Information
127
+
128
+ ### Author
129
+
130
+ The conversion process was performed during June (2025) in the [Language Technologies Laboratory](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/).
131
+
132
+ ### Contact
133
+ For further information, please email <[email protected]>.
134
+
135
+ ### Copyright
136
+ Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
137
+
138
+ ### License
139
+
140
+ [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
141
+
142
+ ### Funding
143
+ This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.
144
+
145
+ The conversion of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
146
+
147
+ We acknowledge EuroHPC Joint Undertaking for awarding us access to MareNostrum5 as BSC, Spain.