Work very well in M3pro 18GB ram

by peterai6377 - opened about 24 hours ago

Discussion

peterai6377

about 24 hours ago

original gpt-oss-20b is terrible in my mac.
but this quantified model give me 40+ token/sec results. Thx!

tperes

about 16 hours ago

This comment has been hidden (marked as Off-Topic)

InferenceIllusionist

Owner about 16 hours ago

Glad to hear it worked better. The speed gains with MLX are definitely noticeable.

Tried to make 2bit and 3bit mixed versions as well as I would love to get this running on 8GB ram in older M1 macbooks, but 4bit retained the most coherence and output quality from testing.

Thanks for your comment and appreciate you giving it a try!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment