Safetensors
English
qwen2
custom_code

text-generation-inference error

#1
by hbvvv1234 - opened

Hi team,

I'm testing mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B using Hugging Face Text Generation Inference (TGI) 3.2.1 on both A100 and V100 GPUs, but I'm encountering the following error during model initialization:

RuntimeError: weight model.layers.0.self_attn.q_proj.weight does not exist

Steps Taken:

Used TGI 3.2.1 (latest) with the following start command:
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/huggingface/text-generation-inference:3.2.1
--model-id mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B

Also tried TGI 3.1.0, but encountered the same issue.

Has anyone encountered this before or have any insights on resolving it? Thanks!

Hey @hbvvv1234 thanks for bringing it up! Indeed you should be using Text Embeddings Inference (TEI) instead, I tried to deploy it with the latest version but it failed, anyway I've submitted a PR to patch it already, so I'll let you know once this model is available on TEI. Thanks!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment