How to output in a Structured format with pydantic model in transformers model?

#10
by Vishva007 - opened

How can I use a Pydantic model to structure the output of a Transformers-based LLM, ensuring the generated responses follow a predefined schema? Specifically, how do I define a Pydantic model, integrate it into a Transformers from_pretrained(), and ensure that the model’s output adheres to the expected structured format?

Did u find it ?

Hi, you can use the outlines (https://github.com/dottxt-ai/outlines) library to do the constrained decoding of the LLM output according to a Pydantic schema.

import outlines

repo_id = "Qwen/Qwen2.5-7B-Instruct"
model = outlines.models.transformers(repo_id)

schema_as_str = json.dumps(RelevancyDetector.schema())

generator = outlines.generate.json(model, schema_as_str)

Here RelevancyDetector is a Pydantic model.

Hi, you can use the outlines (https://github.com/dottxt-ai/outlines) library to do the constrained decoding of the LLM output according to a Pydantic schema.

import outlines

repo_id = "Qwen/Qwen2.5-7B-Instruct"
model = outlines.models.transformers(repo_id)

schema_as_str = json.dumps(RelevancyDetector.schema())

generator = outlines.generate.json(model, schema_as_str)

Here RelevancyDetector is a Pydantic model.

Thank you so much for the detailed guidance, Sakuna! Your example was incredibly helpful. The code works like a charm, and I now understand how to use the outlines library for constrained decoding based on the Pydantic schema. I also found that it works with batch processing, just like the code you provided. I really appreciate you taking the time to explain everything. It made the process much smoother!

import json
import outlines
import torch
from typing import List, Optional
from pydantic import BaseModel, Field

if torch.cuda.is_available():
    print("GPU is Used!")
else:
    print("CPU is Used!")

repo_id = "Qwen/Qwen2.5-0.5B-Instruct"
model = outlines.models.transformers(repo_id,
                                     device="cuda",
                                     model_kwargs={"temperature":0.5})

input_list =[
    "My cat, Whiskers, enjoys a variety of toys: feather wands, laser pointers, and those little crinkly balls",
    "Spot, our energetic dog, loves his snacks: peanut butter biscuits, chewy ropes, and the occasional carrot stick.",
    ]

class PetInfo(BaseModel):
    pet_name: str = Field(..., description="The name of the pet.")
    pet_type: Optional[str] = Field(None, description="The type of the pet (e.g., cat, dog).")
    items: List[str] = Field(..., description="List of items the pet enjoys.")

schema_as_str = json.dumps(PetInfo.model_json_schema())
generator = outlines.generate.json(model, schema_as_str)

output = generator(input_list)
print(json.dumps(output,indent=4))
Vishva007 changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment