🚩 403 Forbidden: error: The model CohereForAI/c4ai-command-r-plus is too large to be loaded automatically (207GB > 10GB).

#62
by gbhall - opened

Dammit. I'm suddenly now getting a 403 and the following response:

{
    "error": "The model CohereForAI/c4ai-command-r-plus is too large to be loaded automatically (207GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints)."
}

Endpoint: https://api-inference.huggingface.co/models/CohereForAI/c4ai-command-r-plus/v1/chat/completions

Is the API inference for this model no longer supported @shivi ? I was relying upon it in a few apps in production with lots of users.

I also notice the page now says "Inference API (serverless) has been turned off for this model." 😔 Is this the end for this model?

Cohere For AI org

Hi @gbhall , the inference endpoint API is provided by Huggingface but we were not officially supporting it. If you are using your model in production, we recommend to check our API -- https://docs.cohere.com/reference/chat

The usage of our models for commercial purposes is not permitted, as per our license

Hi @alexrs , thank you for your response. Do you have any insight into why it has been disabled? I understand it’s out of your control but do you have any contacts at HF you can reach out to?

Thank you. I’m not using the API for commercial purposes. Just useful utilities.

I’ve had a look at your API, unfortunately HF was desirable as it’s directly compatible with the ChatGPT Chat Completions API, using the HF TGI model. This allows you to swap in and out models simply by providing a new endpoint URL.

Hi, I work at HF. I believe we switched to support the newer cohere model: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

If you need it up 24/7, consider using Inference Endpoints (for non-commercial purposes), or Cohere's API (for commercial).

Hi @nbroad , thank you very much! That is such good news!

Can I ask why not just update this model to the newer version? And do you have any tips how I can keep abreast / informed if a model is deactivated and switched to a newer one?

I tried the Dedicated Inference Endpoints yesterday. Unfortunately 3 requests ended up costing 45 min of compute time with a 15 min cooldown period, for a total of $6 USD. Since I'm doing non-commerical purposes this is untenable for me unfortunately, hence why I pay for the HF Pro subscription.

Can I ask why not just update this model to the newer version? And do you have any tips how I can keep abreast / informed if a model is deactivated and switched to a newer one?

It's better to let users access both models. This was Cohere's decision, not HF's.

I tried the Dedicated Inference Endpoints yesterday. Unfortunately 3 requests ended up costing 45 min of compute time with a 15 min cooldown period, for a total of $6 USD. Since I'm doing non-commerical purposes this is untenable for me unfortunately, hence why I pay for the HF Pro subscription.

Try Cohere's API then.

Hi, I work at HF. I believe we switched to support the newer cohere model: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

If you need it up 24/7, consider using Inference Endpoints (for non-commercial purposes), or Cohere's API (for commercial).

Hi @nbroad , damn seems the new model has also been disabled.

For reasons I've listed already, my use case requires the OpenAI compatibility with the Text Generation Inference (TGI) capable Serverless Inference API, which Cohere does not support.

@nbroad , @alexrs is it possible to keep one of these models enabled on HF please. I don't know who's decision and call that is, but this is an extremely useful model to use on HF.

Edit: Nevermind, on the page of https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024 it says it's disabled, but the Serverless Inference API works still.

Sign up or log in to comment