Has the tokenizer of the base model(Mistral-7B-v0.1) been retrained?
#37
by
LH0521
- opened
Hi,
I noticed that Mistral-7B-v0.1 was used as the base model. However, the original Mistral-7B-v0.1 uses BPE tokenization, while I found that NV-Embed-v1 uses a word-by-word mapping method.
Did you retrain the tokenizer? If so, was it because the latent layer needs to integrate the words better?
Thanks!