Fix tokenizer.json with file from Qwen/Qwen2.5-14B

by MariusNocturnum - opened Dec 4, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-2

MariusNocturnum

Dec 4, 2024

Tokenizer.json is 11.4 MB and appears to have gotten messed up in the fine tuning. Tokenizer.json should be 7.03 MB as seen on the base model (Qwen/Qwen2.5-14B)

Fix tokenizer.json with file from Qwen/Qwen2.5-14B56254757

Crystalcareai changed pull request status to closed Dec 4, 2024

Crystalcareai

Arcee AI org Dec 4, 2024

•

edited Dec 4, 2024

Thanks! This is a common issue we've encountered with some of our internal merging tools. We'll work on addressing it moving forward—at the very least, by implementing stronger checks. While it doesn't throw errors during evaluation, testing, or quantization, it does cause problems with certain SageMaker endpoints, among other things. I swapped out the tokenizer with the one from Qwen's Instruct model, just to be safe.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment