sharding to enable loading and fine-tuning on normal size machine?
#1
by
sd3ntato
- opened
so if I got it right, to load this model in 8 bit I would first need to load it on my ram in its 32bit version (and this is even more difficult because the model is just dumped in a unique 44 Gigs file) and then quantize it...?
could we maybe get a sharded, 16 bit version of this thing?
I've been getting very interesting results with the xl version of this but can't seem to scale to the xxl in any easy way according to the resources I own