Zero-Shot Image Classification
OpenCLIP
clip

Model Summary

NLLB-CLIP is a model that combines a text encoder from the NLLB model and an image encoder from the standard CLIP. This allows us to extend the model capabilities to 201 languages of the Flores-200. NLLB-CLIP sets state-of-the-art on the Crossmodal-3600 dataset by performing very well on low-resource languages. You can find more details about the model in the paper.

Acknowledgements

I thank ML Collective for providing Google Cloud compute resources to train the OpenCLIP-compatible version of NLLB-CLIP.

Downloads last month
15,126
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train visheratin/nllb-clip-base-oc