[Model request] Saily 100b, Saily 220b
The size of saily-100b-Q2_K.gguf is 49.6 GB. I expect that I can load the full layer on rtx 3090, 4090 dual graphic card or Google Colab Pro if you quantize this model with IQ2_XXS (bpw2.06).
And the size of saily_220b.Q2_K.gguf is also 87.80 GB, so I think it will be possible to load the full layer on runpod a100 80gb if you quantize it with IQ2_XXS (bpw2.06).
Besides that, my personal wish is that you would make mixtral-7b×8-instruct-LimaRP-zloss and Openbuddy-mixtral-7b×8-v16.3-32k models into IQ2_XSS GGUF as well. This seems to be useful for 16GB graphics card users. Thank you.
100B+ models go beyond the computational resources I have available. But I have now contributed everything that is required
to prepare such models to the llama.cpp
project, so hopefully someone who has access to computers with more RAM/VRAM/compute can do this.