sdelangen commited on
Commit
aa7d510
1 Parent(s): 0be8248

Add errata for incorrect ISO language codes for Hebrew/Javanese

Browse files

See: https://github.com/speechbrain/speechbrain/issues/2396

I have not changed the labels themselves to avoid breaking compatibility with code that would hypothetically make use of these labels.

Will also create a PR in the main repo for this recipe.

Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -316,6 +316,8 @@ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling
316
  The system is trained with recordings sampled at 16kHz (single channel).
317
  The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
318
 
 
 
319
  #### Limitations and bias
320
 
321
  Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are:
 
316
  The system is trained with recordings sampled at 16kHz (single channel).
317
  The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
318
 
319
+ Warning: In the dataset and in the defaults of this model (see [`label_encoder.txt`](label_encoder.txt), the used ISO language code for Hebrew is obsolete (should be `he` instead of `iw`). The ISO language code for Javanese is incorrect (should be `jv` instead of `jw`). See [issue #2396](https://github.com/speechbrain/speechbrain/issues/2396).
320
+
321
  #### Limitations and bias
322
 
323
  Since the model is trained on VoxLingua107, it has many limitations and biases, some of which are: