What are folks opinion on 4KM quants? Are they viable?

by Permahuman - opened 2 days ago

2 days ago

Question in discussion title. I really want the whole model to fit on a 3090 without offloading to ram to get those legendary token generation speeds. I have heard that quality may be low below 6 or 5 quants. Has anyone tried a 4KM quant yet? I have a really bad rural internet connection so would really appreciate some feedback.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment