Czech in the public test metadata

#5
by jack-etheredge - opened

Hi,
Some of the public test metadata contains Czech. Is this expected? Most of it is in English. I'm more concerned about the possibility of additional unseen languages or Czech examples in the private leaderboard based on this finding. If one wishes to leverage the metadata information, this would unnecessarily complicate things/degrade performance. Assuming there's a 1:1 mapping between the Czech and the English, it's not particularly difficult to deal with what's in the public test metadata, but again it's the possibility of unknown unknowns I find more concerning.
Best,
Jack

Bohemian Visual Recognition Alliance org

Hi @jack-etheredge ,

I did look into that, and you are right on this one. This was not intended. Thank you for reporting it!

Since those are 1:1 translations, I should update the file pretty quickly. I will update you through this thread once it is done—hopefully later today.

Best,
Lukas

Thanks!

Do you mind making the updated metadata available for the public leaderboard?

For example, by updating the file here: https://huggingface.co/picekl/FungiCLEF2024-Sample_Submission/tree/main

Bohemian Visual Recognition Alliance org

Hi @jack-etheredge ,

Finally, I fix it :)
You can find the updated metadata file in the sample repo.

Best,
Lukas

Hi Lukas,

There's still some Czech in there in the Habitat column (and thus presumably in the Habitat column for the private leaderboard?).

Some examples:
smíšený les
jehličnatý les / monokultura
acidofilní / kyselá doubrava
louka/trávník

Best,
Jack

Bohemian Visual Recognition Alliance org

Hi @jack-etheredge ,

Can you please check the new update? It should work now.

Best,
Lukas

Sign up or log in to comment