[Error?] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
#7
by
flexai
- opened
When using an example from https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form, I receive Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
warning.
Is it expected or does it indicate an error in the set up on my end?
In addition to the loading example, I prepare the model locally during the docker image build with the following method:
def download_model():
import os
import transformers
from huggingface_hub import snapshot_download
# Ensure folder exists
os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
snapshot_download(
repo_id="distil-whisper/distil-large-v3",
allow_patterns=["model.safetensors", "*.json", "*.txt"],
local_dir=MODEL_CACHE_DIR,
)
transformers.utils.move_cache()
then when loading, instead of specifying a model string, I provide MODEL_CACHE_DIR
instead.
Hey boss, I haven't run it since so let us close this issue until further notice! Btw, thank you for the models. It's huge value.
flexai
changed discussion status to
closed