api-inference documentation

Supported Models

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Supported Models

Given the fast-paced nature of the open ML ecosystem, the Inference API exposes models that have large community interest and are in active use (based on recent likes, downloads, and usage). Because of this, deployed models can be swapped without prior notice. The Hugging Face stack aims to keep all the latest popular models warm and ready to use.

You can find:

What do I get with a PRO subscription?

In addition to thousands of public models available in the Hub, PRO and Enterprise users get higher rate limits and free access to the following models:

Model Size Supported Context Length Use
Meta Llama 3.1 Instruct 8B, 70B 70B: 32k tokens / 8B: 8k tokens High quality multilingual chat model with large context length
Meta Llama 3 Instruct 8B, 70B 8k tokens One of the best chat models
Llama 2 Chat 7B, 13B, 70B 4k tokens One of the best conversational models
Bark 0.9B - Text to audio generation

Running Private Models

The free Serverless API is designed to run popular public models. If you have a private model, you can use Inference Endpoints to deploy it.

< > Update on GitHub