Adding Arabert

#3
by wissamantoun - opened

Hey is it possible if you have time to add the arabert tokenizers. You can make use of the arabert preprocessing code to make installation and tokenization easier. Thanks a lot for this resource

They are already there
Screenshot from 2025-02-22 09-46-25.png

Well when i visit the page and refresh even it doesn't appear

image.png

Also for the v1 and v2 models they need the arabert preprocessor

I updated a clone of the leaderboard code to correctly support AraBERT models with preprocessing. i can submit a pull request if you want. https://huggingface.co/spaces/wissamantoun/arabic-tokenizers-leaderboard
image.png

arabertv02 uses a custom processor/tokenizer that's why it's not being added
Open a PR of its addition if you want to put it (keep your changes to a minimum if you want it approved)

Sign up or log in to comment