deploy the model on cloud machine
Hello all, I tried downloading the model locally and after the download finished i tried to run the sample code and it showed an error related to offload folder path and I did not manage to solve it, actually I don't know what is that..
So, I'm trying to deploy the model on a virtual machine to have the suitable specs.. am using runpod
and i have this error on the 6th model download
ERROR text_generation launcher: An error occurred while downloading using hf_transfer
. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
can any one help with any of the issues with how to use it locally, step-by-step guide for the regular level laptops or the steps to deploy on cloud and use it with apis
thanks
Hi
@MazenSiraj
,
The issue with offload folder can be solved by adding offload_folder='offload'
self.model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", offload_folder='offload', trust_remote_code=True)
i have submitted a pull request so that the model can be deployed on hface inference endpoint
https://huggingface.co/inception-mbzuai/jais-13b-chat/discussions/12
while PR is being reviewed you can check out my copy of this model which already has those changes - see button deploy in top right corner
please note that you will need a beefy machine to run it, i was able to run it on GPU [large] · 4x Nvidia Tesla T4 which is $ 4.50 per h, small and medium size machines were not able to run it
https://huggingface.co/poiccard/jais-13b-chat-adn
Hi
@poiccard
,
Thank u so much, I will check it. may I ask u, I tried to run it on my machine, it ran but every time I run the sample code it downloads again?
if you could support with the steps to run the model and use it, will be helpful.
thanks
@poiccard
this is what i get every time I run the sample code and it starts downloading all over again, I don't think this is how it should go, correct?
Hi,
how did you clone it, make you have actually downloaded the bin files not just referencegit lfs install git clone https://huggingface.co/inception-mbzuai/jais-13b-chat
this model is big and is divided into pieces (shards) - what it tries to do next, is to load those shards into memory (so it is not downloading, but loading)
you can check more here
https://huggingface.co/docs/accelerate/v0.19.0/en/usage_guides/big_modeling
i was not able to launch this model on my machine, but i got in contact with model creators, and inshallah we will be working on improvements
in meantime as i mentioned previously, you can deploy my version of the model on huggingface inference endpoint (4.5 usd per hour - you can put it to sleep when you don't need it)