[Suggestion Needed]

#2
by Darkknight535 - opened

Which model you prefer(under 70B) that is like GeneticLeamonade v3 70B. I like that model alot. It's too good in wholesome replies but i can only run it at IQ3_XS and at 2.5t/s which is too slow for me. :\

It might be recency bias but I've been running this model for the past week myself.

For something that feels more like GeneticLemonade but smaller, best bet would probably be the Llama 3.3 pruned model Nemotron Super v1.5 / one of Drummer's Valkyrie tunes on it. I believe he's working on a V2 currently.

GLM4.5 air is possibly an option, if you have a 3090 and some fast ram, offloading on that would get you around 4-5tk/s probably.

It might be recency bias but I've been running this model for the past week myself.

For something that feels more like GeneticLemonade but smaller, best bet would probably be the Llama 3.3 pruned model Nemotron Super v1.5 / one of Drummer's Valkyrie tunes on it. I believe he's working on a V2 currently.

GLM4.5 air is possibly an option, if you have a 3090 and some fast ram, offloading on that would get you around 4-5tk/s probably.

wish i could issue is i've 2 T4 gpus (16x2) 32GB vram and it's turing arch which is slow for llm these days. :\ Thanks for suggestion buddy i'll give this one another shot.

Darkknight535 changed discussion status to closed

Sign up or log in to comment