Quick Fix: Rope Scaling or Rope Type Error
Hi Everyone,
Instead of using the default image URI, use 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0
. If this requires specific access to ECR, you can pull the above image, create a new image based on it, and push it into your private ECR repository. Then, you can use the URL from your private repository directly.
I hope this will solve your problem :)
Thanks
It didnt work for me :(
What error you're getting?
I used a bitsandbytes quantized version that i fine tuned (specifically the unsloth version). It said unknown quant method bitsandbytes, then said:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4145x4096 and 1x12582912)
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.130106Z[0m [31mERROR[0m [1mwarmup[0m[1m{[0m[3mmax_input_length[0m[2m=[0m4095 [3mmax_prefill_tokens[0m[2m=[0m4145 [3mmax_total_tokens[0m[2m=[0m4096 [3mmax_batch_size[0m[2m=[0mNone[1m}[0m[2m:[0m[1mwarmup[0m[2m:[0m [2mtext_generation_client[0m[2m:[0m [2mrouter/client/src/lib.rs[0m[2m:[0m[2m46:[0m Server error: CANCELLED
2024/07/31 08:43:36
Error: WebServer(Warmup(Generation("CANCELLED")))
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.300019Z[0m [31mERROR[0m [2mtext_generation_launcher[0m[2m:[0m Webserver Crashed
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.300043Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Shutting down shards
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.372684Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Terminating shard [2m[3mrank[0m[2m=[0m0[0m
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.372816Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m Waiting for shard to gracefully shutdown [2m[3mrank[0m[2m=[0m0[0m
2024/07/31 08:43:36
[2m2024-07-31T15:43:36.473032Z[0m [32m INFO[0m [1mshard-manager[0m: [2mtext_generation_launcher[0m[2m:[0m shard terminated [2m[3mrank[0m[2m=[0m0[0m
2024/07/31 08:43:36
Error: WebserverFailed```
and some others. Thoughts?
Now its saying:
2024/07/31 10:56:08
[2m2024-07-31T17:56:08.516454Z[0m [32m INFO[0m [2mtext_generation_router::server[0m[2m:[0m [2mrouter/src/server.rs[0m[2m:[0m[2m1599:[0m Using scheduler V3
2024/07/31 10:56:08
[2m2024-07-31T17:56:08.516472Z[0m [32m INFO[0m [2mtext_generation_router::server[0m[2m:[0m [2mrouter/src/server.rs[0m[2m:[0m[2m1651:[0m Setting max batch total tokens to 442032
2024/07/31 10:56:08
[2m2024-07-31T17:56:08.570658Z[0m [32m INFO[0m [2mtext_generation_router::server[0m[2m:[0m [2mrouter/src/server.rs[0m[2m:[0m[2m1889:[0m Connected
but not every being done initializing.
Is it a port issue?