--- language: en pipeline_tag: text2text-generation tags: - text-to-sql - t5 - natural-language-processing - sql license: apache-2.0 datasets: - gretelai/synthetic_text_to_sql base_model: - Salesforce/codet5-base --- # Text-to-SQL T5 Model (`16pramodh/t2s_model`) ## Model Description This is a **T5-based text-to-SQL model** trained to convert **natural language questions** into **SQL queries**. It works by taking in: natural language query [SEP] table schema and producing a SQL statement based on the provided database schema. The model is based on `T5ForConditionalGeneration` and supports **text2text-generation** via the Hugging Face Inference API. --- ## Intended Use - **Input:** English natural language question **plus** the database schema. - **Output:** SQL query that can be executed on the described database. --- ## Example **Input:** Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE); **Output:** SELECT name, email FROM customers WHERE signup_date > '2024-01-01'; --- ## How to Use ### Hugging Face Inference API ```bash curl -X POST \ -H "Authorization: Bearer YOUR_HF_TOKEN" \ -H "Content-Type: application/json" \ -d '{"inputs": "Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE);"}' \ https://api-inference.huggingface.co/models/16pramodh/t2s_model ``` ### Python (Transformers) ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "16pramodh/t2s_model" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) input_text = "Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE);" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ---