mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B · text-generation-inference error

Hi team,

I'm testing mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B using Hugging Face Text Generation Inference (TGI) 3.2.1 on both A100 and V100 GPUs, but I'm encountering the following error during model initialization:

RuntimeError: weight model.layers.0.self_attn.q_proj.weight does not exist

Steps Taken:

Used TGI 3.2.1 (latest) with the following start command:
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data
ghcr.io/huggingface/text-generation-inference:3.2.1
--model-id mims-harvard/ToolRAG-T1-GTE-Qwen2-1.5B

Also tried TGI 3.1.0, but encountered the same issue.

Has anyone encountered this before or have any insights on resolving it? Thanks!