Inference Providers documentation
Groq
Groq
All supported Groq models can be found here
Groq is fast AI inference. Their groundbreaking LPU technology delivers record-setting performance and efficiency for GenAI models. With custom chips specifically designed for AI inference workloads and a deterministic, software-first approach, Groq eliminates the bottlenecks of conventional hardware to enable real-time AI applications with predictable latency and exceptional throughput so developers can build fast.
For latest pricing, visit our pricing page.
Resources
- Website: https://groq.com/
- Documentation: https://console.groq.com/docs
- Community Forum: https://community.groq.com/
- X: @GroqInc
- LinkedIn: Groq
- YouTube: Groq
Supported tasks
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
Language
Client
Provider
Copied
import os
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="openai/gpt-oss-20b:groq",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
)
print(completion.choices[0].message)
