Structured Outputs with Inference Providers

In this guide, we’ll show you how to use Inference Providers to generate structured outputs that follow a specific JSON schema. This is incredibly useful for building reliable AI applications that need predictable, parsable responses.

Structured outputs guarantee a model returns a response that matches your exact schema every time. This eliminates the need for complex parsing logic and makes your applications more robust.

This guide assumes you have a Hugging Face account. If you don’t have one, you can create one for free at huggingface.co.

What Are Structured Outputs?

Structured outputs make sure model responses always follow a specific structure, typically a JSON Schema. This means you get predictable, type-safe data that integrates easily with your systems. The model follows a strict template so you always get the data in the format you expect.

Traditionally, getting structured data from LLMs required prompt engineering (asking the model to “respond in JSON format”), post-processing and parsing the response, and sometimes retrying when parsing failed. This approach is unreliable and can lead to brittle applications.

With structured outputs, you get:

Guaranteed compliance with your defined schema
Fewer errors from malformed or unparsable JSON
Easier integration with downstream systems
No need for retry logic or complex error handling
More efficient use of tokens (less verbose instructions)

In short, structured outputs make your applications more robust and reliable by ensuring every response matches your schema, with built-in validation and type safety.

Step 1: Define Your Schema

Before making any API calls, you need to define the structure you want. Let’s build a practical example: extracting structured information from research papers. This is a common real-world use case where you need to parse academic papers and extract key details like title, authors, contributions, and methodology.

We’ll create a simple schema that captures the most essential elements: the paper’s title and a summary of its abstract. The easiest way to do this is to use Pydantic, a library that allows you to define Python classes that represent JSON schemas (among other things).

from pydantic import BaseModel

class PaperAnalysis(BaseModel):
    title: str
    abstract_summary: str

Using model_json_schema we can convert the Pydantic model to a JSON Schema which is what the model will receive as a response format instruction. This is the schema that the model will use to generate the response.

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "abstract_summary": {"type": "string"}
  },
  "required": ["title", "abstract_summary"]
}

This simple schema ensures we’ll always get the paper’s title and a concise summary of its abstract. Notice how we mark both fields as required - this guarantees they’ll always be present in the response, making our application more reliable.

Step 2: Set up your inference client

Now that we have our schema defined, let’s set up the client to communicate with the inference providers. We’ll show you two approaches: the Hugging Face Hub client (which gives you direct access to all Inference Providers) and the OpenAI client (which works through OpenAI-compatible endpoints).

huggingface_hub

openai

Structured outputs are a good use case for selecting a specific provider and model because you want to avoid incompatibility issues between the model, provider and the schema.

Step 3: Generate structured output

Now let’s extract structured information from a research paper. We’ll send the paper content to the model along with our schema, and get back perfectly structured data.

For this example, we’ll analyze a famous AI research paper. The model will read the paper and extract the key information according to our predefined schema.

huggingface_hub

openai

Step 4: Handle the Response

Both approaches guarantee that your response will match the specified schema. Here’s how to access and use the structured data:

huggingface_hub

openai

The structured output might look like this:

{
  "title": "Attention Is All You Need",
  "abstract_summary": "Introduces the Transformer architecture based solely on attention mechanisms, eliminating recurrence and convolutions for sequence transduction tasks. Shows superior quality in machine translation while being more parallelizable and requiring less training time."
}

You can now confidently process this data without worrying about parsing errors or missing fields. The schema validation ensures that required fields are always present and data types are correct.

Complete Working Example

Here’s a complete script that you can run immediately to see structured outputs in action:

Click to expand complete script

huggingface_hub

openai

Next Steps

Now that you understand structured outputs, you probably want to build an application that uses them. Here are some ideas for fun things you can try out:

Different models: Experiment with different models. The biggest models are not always the best for structured outputs!
Multi-turn conversations: Maintaining structured format across conversation turns.
Complex schemas: Building domain-specific schemas for your use case.
Performance optimization: Choosing the right provider for your structured output needs.

< > Update on GitHub