SmolLM 64bit
#6
by
snapo
- opened
Heya Smol Modelers,
I would like to ask if there was ever some one in the team to try to train a 50M param model but with 64bit instead of 32bit/bf16.
Was this ever considered?
The reason why i am asking is we are able with current quants to create realy small "good" models but somehow we are not able to go below 100M tokens for a "good" model. The idea behind is that the 64bit vectors could store a huge ammount of more data/ more accurate in predicting a vectore (the choice between which token to output).
Would love to get an answer if someone already did try that.
for inference smollm3 can be run in 64bit and it improves bench tests a little....
Best regards and thanks for a answer from the pro's