Add errata for incorrect ISO language codes for Hebrew/Javanese

See: https://github.com/speechbrain/speechbrain/issues/2396

I have not changed the labels themselves to avoid breaking compatibility with code that would hypothetically make use of these labels.

Will also create a PR in the main repo for this recipe.

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -316,6 +316,8 @@ To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling
 The system is trained with recordings sampled at 16kHz (single channel).
 The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
 #### Limitations and bias
 Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are:

 The system is trained with recordings sampled at 16kHz (single channel).
 The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
+Warning: In the dataset and in the defaults of this model (see [`label_encoder.txt`](label_encoder.txt), the used ISO language code for Hebrew is obsolete (should be `he` instead of `iw`). The ISO language code for Javanese is incorrect (should be `jv` instead of `jw`). See [issue #2396](https://github.com/speechbrain/speechbrain/issues/2396).
 #### Limitations and bias
 Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are: