Work very well in M3pro 18GB ram

#1
by peterai6377 - opened

original gpt-oss-20b is terrible in my mac.
but this quantified model give me 40+ token/sec results. Thx!

This comment has been hidden (marked as Off-Topic)

Glad to hear it worked better. The speed gains with MLX are definitely noticeable.

Tried to make 2bit and 3bit mixed versions as well as I would love to get this running on 8GB ram in older M1 macbooks, but 4bit retained the most coherence and output quality from testing.

Thanks for your comment and appreciate you giving it a try!

Sign up or log in to comment