Inference Providers documentation
Cohere
Cohere
Cohere brings you cutting-edge multilingual models, advanced retrieval, and an AI workspace tailored for the modern enterprise — all within a single, secure platform.
Cohere is the first model creator to share and serve their models directly on the Hub as an Inference Provider.
Supported tasks
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="cohere",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="CohereLabs/c4ai-command-a-03-2025",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
max_tokens=500,
)
print(completion.choices[0].message)
Chat Completion (VLM)
Find out more about Chat Completion (VLM) here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="cohere",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="CohereLabs/aya-vision-8b",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
],
max_tokens=500,
)
print(completion.choices[0].message)