Running this model without quantization.

#9
by Daaku-C5 - opened

I'm trying to use this model on a VM that I have. Just a silly question: What is the minimum requirement to run this 8b model without quantization? I'm currently using a 24 gigs gpu.
ERROR:
CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 23.75 MiB is free.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment