Anyone facing issues deploying this model on Sagemaker

#1
by genzml - opened

I am using ml.g5.12xlarge instance to deploy. The model deployments fails during health checks with the below error

sagemaker.exceptions.UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-06-22-22-26-05-836: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

H2O.ai org

Hi @genzml - unfortunately I have no experience with sagemaker. Might be good to check with them to see if they know what's up.

psinger changed discussion status to closed

Sign up or log in to comment