CLIP ViT-L/14 finetune: SAE-informed adversarial training
SAE = Sparse autoencoder
Accuracy ImageNet/ObjectNet my GmP: 91% > SAE (this): 89% > OpenAI pre-trained: 84.5%
But, it's fun to use with e.g. Flux.1 - get the Text-Encoder TE only version ⬇️ and try it!
And this SAE CLIP has best results for linear probe @ LAION-AI/CLIP_benchmark (see below)
This CLIP direct download is also the best CLIP to use for HunyuanVideo.
Required: Use with my zer0int/ComfyUI-HunyuanVideo-Nyan node (changes influence of LLM vs. CLIP; otherwise, difference is very little).
- Interesting things with adversarial robustness to try: Right-click and download individual images: Image 1 -- Image 2 -- Image 3
- Upload each into zero-shot [hopefully available soon on the right here->]
- Try labels (class names): a photo of a cat, a photo of a dog, a photo of a text
- Repeat the same with e.g. my GmP models models and see what happens. =)
- I'm really hoping the HF format .safetensors conversion didn't mess anything up (it happens!); just in case it did, or if there's no inference API available to use:
- I put a script that will do the same thing (on the not-converted model) on my GitHub repo. Plus, you can just reproduce the fine-tune yourself, as that code is also available! 🤗
- 👉 All training info & code: github.com/zer0int/CLIP-SAE-finetune
- ☕ Buy me a coffee
- Downloads last month
- 311
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for zer0int/CLIP-SAE-ViT-L-14
Base model
openai/clip-vit-large-patch14