
by caterpillarman - opened

How does this thing tokenize? I'm trying to train a model to speak Sanskrit. If I add word tokens, will the model find them even though they have a higher index than the individual character tokens? I would fine tune, of course, once I figure out how to do that with mistral based models.

Sign up or log in to comment