Flan-T5 Large Fine-Tuned on EFRA Dataset

This is a fine-tuned version of Flan-T5 XL on the EFRA dataset for summarizing legal documents related to food regulations and policies.

Model Description

Flan-T5 is a sequence-to-sequence model trained for text-to-text tasks. This fine-tuned version is specifically optimized for summarizing legal text in the domain of food legislation, regulatory requirements, and compliance documents.

Fine-Tuning Details

Base Model: google/flan-t5-large
Dataset: EFRA (a curated dataset of legal documents in the food domain)
Objective: Summarization of legal documents
Framework: Hugging Face Transformers

Applications

This model is suitable for:

Summarizing legal texts in the food domain
Extracting key information from lengthy regulatory documents
Assisting legal professionals and food companies in understanding compliance requirements

Example Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("giuid/flan_t5_xl_summarization_v2")
tokenizer = AutoTokenizer.from_pretrained("giuid/flan_t5_xl_summarization_v2")

# Input text
input_text = "Your lengthy legal document text here..."

# Tokenize and generate summary
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(inputs.input_ids, max_length=150, num_beams=5, early_stopping=True)

# Decode summary
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)