i'm freaked out , what model should i ise , newest ?

#1
by pikkaa - opened

what long clip l should i download and use with sdxl or flux ? each have own advantages or getting the newest ..over 1 hour navigating your page .. lost..

thanks man , i have to idea what you saying and still have no idea which to use ..or try to use with flux , what is for what ! is it deserve the hustle ! (sorry for my weak language)

In that case [text-to-image, text-to-video, ...], you just need the Text Encoders, where available.

So, in other words, if you try these 3, you'll have tried all the Long-CLIP-L models I've fine-tuned. Hope that helps! :)

https://huggingface.co/zer0int/LongCLIP-Registers-Gated_MLP-ViT-L-14/resolve/main/Long-ViT-L-14-REG-TE-only-HF-format.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-SAE-TE-only.safetensors?download=true

https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-ft.safetensors?download=true

Same thing, I was confused then it made sense but here I am again, clueless..

https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/model.safetensors I am using this one right now for HiDream and it works. But is LongCLIP-Registers-Gated_MLP-ViT-L-14 better than the SAE ?

I am using it in combination with CLIP-ViT-bigG-14-laion2B-39B-b160k-FP32. Now I also want to know if any of your models are capable of NSFW, truth be told that's why I am scouring for clip models πŸ˜…

Clip models are a deep rabbit hole and any help would be very appreciated. Have a good day man and thank you for all the work you do !

@compan - That's not an issue with regard to CLIP. CLIP has been trained with an unfiltered dataset. See this paper for details (including visual details, page 5).
I did NOT censor CLIP in any way, nor did I explicitly train it to be 'more lewd'. My models can do all those things (provide embeddings that will guide a diffusion model towards NSFW stuff); it's down to your diffusion model (or other text encoders like T5) that may have NSFW censorship.

If you nevertheless want to train a CLIP (on anything you like), you can use my code on github. 24 GB VRAM + a dataset with 40,000 images will take ~6 hours on an RTX4090. Hope that helps!

@zer0int Wow I didn't expect such detailed response man. Thank you a lot, I will read the paper for my curiosity is boundless lol. I figured that the NSFW stuff is constrained by the text encoders like T5 and Llama and the base diffusion model however I am more of a "measure three times, cut once" type of guy lmao so I had to ask. Thank you man.

@miasik Holy shit, such a detailed guide. The explanation of each model and their advantages and the which one should you choose. Wow. Even my smooth brain understood everything. You need to write more guides man.

I just instructed Gemini's Deep Research ;-)
My reason was the same as yours β€” I blew my mind trying to choose the proper model for my needs.

@miasik Tell me about it man, as you can see I've been hoarding couple of them and still not 100% sure that I've picked the right ones but now with your guide it's crystal clear. Trying out the newest model is like going to school again except you are starting at 1st grade xD I love it though :D Shame there aren't many clip_H models for WAN.

@miasik Yeah I picked that one after I read your guide. I just took the full FP32 model and converted it BF16, I didn't remove the vision layers since it doesn't error out in comfy πŸ’€ I do be wondering if it will be better to just make it a text encoder only xD Can attest to your guide about the disadvantage of LongCLIP, text is more legible with the regular clip. The reason I didn't picked that one is because of the mixed precision so I went with the full.

Btw do you use any clip G models other than the default ? I'm using the CLIP-ViT-bigG-14-laion2B-39B-b160k it works great, just exchanging some thoughts.

@compan
No, I don't change ClipG
Every time, when you use external Clip(instead of the integrated into the model), there is a chance that you lose some details or the model change its behavior.
So, testing every time is the only way to answer your question.

Sign up or log in to comment