zer0int/LongCLIP-Registers-Gated_MLP-ViT-L-14 · i'm freaked out , what model should i ise , newest ?

30 days ago

•

what long clip l should i download and use with sdxl or flux ? each have own advantages or getting the newest ..over 1 hour navigating your page .. lost..

zer0int

Owner 30 days ago

In that case [text-to-image, text-to-video, ...], you just need the Text Encoders, where available.

So, in other words, if you try these 3, you'll have tried all the Long-CLIP-L models I've fine-tuned. Hope that helps! :)

https://huggingface.co/zer0int/LongCLIP-Registers-Gated_MLP-ViT-L-14/resolve/main/Long-ViT-L-14-REG-TE-only-HF-format.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-SAE-TE-only.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-ft.safetensors?download=true

pikkaa

29 days ago

•

edited 29 days ago

thanks man , i have to idea what you saying and still have no idea which to use ..or try to use with flux , what is for what ! is it deserve the hustle ! (sorry for my weak language)

compan

13 days ago

In that case [text-to-image, text-to-video, ...], you just need the Text Encoders, where available.

So, in other words, if you try these 3, you'll have tried all the Long-CLIP-L models I've fine-tuned. Hope that helps! :)

https://huggingface.co/zer0int/LongCLIP-Registers-Gated_MLP-ViT-L-14/resolve/main/Long-ViT-L-14-REG-TE-only-HF-format.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-SAE-TE-only.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-ft.safetensors?download=true

Same thing, I was confused then it made sense but here I am again, clueless..

https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/model.safetensors I am using this one right now for HiDream and it works. But is LongCLIP-Registers-Gated_MLP-ViT-L-14 better than the SAE ?

I am using it in combination with CLIP-ViT-bigG-14-laion2B-39B-b160k-FP32. Now I also want to know if any of your models are capable of NSFW, truth be told that's why I am scouring for clip models 😅

Clip models are a deep rabbit hole and any help would be very appreciated. Have a good day man and thank you for all the work you do !

zer0int

Owner 10 days ago

@compan - That's not an issue with regard to CLIP. CLIP has been trained with an unfiltered dataset. See this paper for details (including visual details, page 5).
I did NOT censor CLIP in any way, nor did I explicitly train it to be 'more lewd'. My models can do all those things (provide embeddings that will guide a diffusion model towards NSFW stuff); it's down to your diffusion model (or other text encoders like T5) that may have NSFW censorship.

If you nevertheless want to train a CLIP (on anything you like), you can use my code on github. 24 GB VRAM + a dataset with 40,000 images will take ~6 hours on an RTX4090. Hope that helps!

compan

7 days ago

@zer0int Wow I didn't expect such detailed response man. Thank you a lot, I will read the paper for my curiosity is boundless lol. I figured that the NSFW stuff is constrained by the text encoders like T5 and Llama and the base diffusion model however I am more of a "measure three times, cut once" type of guy lmao so I had to ask. Thank you man.

miasik

6 days ago

•

edited 6 days ago

@compan , @pikkaa
https://blog.miasik.one/2025/07/a-guide-to-zer0ints-specialized-clip.html?view=classic

compan

6 days ago

•

edited 6 days ago

@miasik Holy shit, such a detailed guide. The explanation of each model and their advantages and the which one should you choose. Wow. Even my smooth brain understood everything. You need to write more guides man.

miasik

6 days ago

I just instructed Gemini's Deep Research ;-)
My reason was the same as yours — I blew my mind trying to choose the proper model for my needs.

compan

6 days ago

@miasik Tell me about it man, as you can see I've been hoarding couple of them and still not 100% sure that I've picked the right ones but now with your guide it's crystal clear. Trying out the newest model is like going to school again except you are starting at 1st grade xD I love it though :D Shame there aren't many clip_H models for WAN.

miasik

6 days ago

@compan
Pick this:
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors
it's from there:
https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14

The newest is not always even better.

compan

5 days ago

@miasik Yeah I picked that one after I read your guide. I just took the full FP32 model and converted it BF16, I didn't remove the vision layers since it doesn't error out in comfy 💀 I do be wondering if it will be better to just make it a text encoder only xD Can attest to your guide about the disadvantage of LongCLIP, text is more legible with the regular clip. The reason I didn't picked that one is because of the mixed precision so I went with the full.

Btw do you use any clip G models other than the default ? I'm using the CLIP-ViT-bigG-14-laion2B-39B-b160k it works great, just exchanging some thoughts.

miasik

5 days ago

@compan
No, I don't change ClipG
Every time, when you use external Clip(instead of the integrated into the model), there is a chance that you lose some details or the model change its behavior.
So, testing every time is the only way to answer your question.