text-generation-inference error
Hi team,
I'm testing mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B using Hugging Face Text Generation Inference (TGI) 3.2.1 on both A100 and V100 GPUs, but I'm encountering the following error during model initialization:
RuntimeError: weight model.layers.0.self_attn.q_proj.weight does not exist
Steps Taken:
Used TGI 3.2.1 (latest) with the following start command:
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/huggingface/text-generation-inference:3.2.1
--model-id mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B
Also tried TGI 3.1.0, but encountered the same issue.
Has anyone encountered this before or have any insights on resolving it? Thanks!
Hey @hbvvv1234 thanks for bringing it up! Indeed you should be using Text Embeddings Inference (TEI) instead, I tried to deploy it with the latest version but it failed, anyway I've submitted a PR to patch it already, so I'll let you know once this model is available on TEI. Thanks!