Add errata for incorrect ISO language codes for Hebrew/Javanese
Browse filesSee: https://github.com/speechbrain/speechbrain/issues/2396
I have not changed the labels themselves to avoid breaking compatibility with code that would hypothetically make use of these labels.
Will also create a PR in the main repo for this recipe.
README.md
CHANGED
@@ -316,6 +316,8 @@ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling
|
|
316 |
The system is trained with recordings sampled at 16kHz (single channel).
|
317 |
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
|
318 |
|
|
|
|
|
319 |
#### Limitations and bias
|
320 |
|
321 |
Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are:
|
|
|
316 |
The system is trained with recordings sampled at 16kHz (single channel).
|
317 |
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
|
318 |
|
319 |
+
Warning: In the dataset and in the defaults of this model (see [`label_encoder.txt`](label_encoder.txt), the used ISO language code for Hebrew is obsolete (should be `he` instead of `iw`). The ISO language code for Javanese is incorrect (should be `jv` instead of `jw`). See [issue #2396](https://github.com/speechbrain/speechbrain/issues/2396).
|
320 |
+
|
321 |
#### Limitations and bias
|
322 |
|
323 |
Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are:
|