How to output in a Structured format with pydantic model in transformers model?
How can I use a Pydantic model to structure the output of a Transformers-based LLM, ensuring the generated responses follow a predefined schema? Specifically, how do I define a Pydantic model, integrate it into a Transformers from_pretrained(), and ensure that the model’s output adheres to the expected structured format?
Did u find it ?
Hi, you can use the outlines (https://github.com/dottxt-ai/outlines) library to do the constrained decoding of the LLM output according to a Pydantic schema.
import outlines
repo_id = "Qwen/Qwen2.5-7B-Instruct"
model = outlines.models.transformers(repo_id)
schema_as_str = json.dumps(RelevancyDetector.schema())
generator = outlines.generate.json(model, schema_as_str)
Here RelevancyDetector
is a Pydantic model.
Hi, you can use the outlines (https://github.com/dottxt-ai/outlines) library to do the constrained decoding of the LLM output according to a Pydantic schema.
import outlines repo_id = "Qwen/Qwen2.5-7B-Instruct" model = outlines.models.transformers(repo_id) schema_as_str = json.dumps(RelevancyDetector.schema()) generator = outlines.generate.json(model, schema_as_str)
Here
RelevancyDetector
is a Pydantic model.
Thank you so much for the detailed guidance, Sakuna! Your example was incredibly helpful. The code works like a charm, and I now understand how to use the outlines library for constrained decoding based on the Pydantic schema. I also found that it works with batch processing, just like the code you provided. I really appreciate you taking the time to explain everything. It made the process much smoother!
import json
import outlines
import torch
from typing import List, Optional
from pydantic import BaseModel, Field
if torch.cuda.is_available():
print("GPU is Used!")
else:
print("CPU is Used!")
repo_id = "Qwen/Qwen2.5-0.5B-Instruct"
model = outlines.models.transformers(repo_id,
device="cuda",
model_kwargs={"temperature":0.5})
input_list =[
"My cat, Whiskers, enjoys a variety of toys: feather wands, laser pointers, and those little crinkly balls",
"Spot, our energetic dog, loves his snacks: peanut butter biscuits, chewy ropes, and the occasional carrot stick.",
]
class PetInfo(BaseModel):
pet_name: str = Field(..., description="The name of the pet.")
pet_type: Optional[str] = Field(None, description="The type of the pet (e.g., cat, dog).")
items: List[str] = Field(..., description="List of items the pet enjoys.")
schema_as_str = json.dumps(PetInfo.model_json_schema())
generator = outlines.generate.json(model, schema_as_str)
output = generator(input_list)
print(json.dumps(output,indent=4))