--- library_name: transformers tags: [text2sql, sql-generation, t5, natural-language-processing] --- # Model Card for ThotaBhanu/t5_sql_askdb ## Model Details ### Model Description This model is a **T5-based Natural Language to SQL** converter, fine-tuned on the **WikiSQL dataset**. It is designed to convert **English natural language queries** into **SQL queries** that can be executed on relational databases. - **Developed by:** Bhanu Prasad Thota - **Shared by:** Bhanu Prasad Thota - **Model type:** T5-based Sequence-to-Sequence Model - **Language(s):** English - **License:** MIT - **Finetuned from model:** `t5-large` This model is particularly useful for **text-to-SQL applications**, allowing users to **query databases using plain English** instead of writing SQL. --- ## Model Sources - **Repository:** [https://huggingface.co/ThotaBhanu/t5_sql_askdb](https://huggingface.co/ThotaBhanu/t5_sql_askdb) - **Paper [optional]:** N/A - **Demo [optional]:** Coming soon --- ## Uses ### Direct Use - Convert **natural language questions** into **SQL queries** - Assist in **database query automation** - Can be used in **chatbots, data analytics tools, and enterprise database search systems** ### Downstream Use - Can be **fine-tuned** further on **custom datasets** to improve domain-specific SQL generation - Can be integrated into **business intelligence tools** for better user interaction ### Out-of-Scope Use - The model does **not infer database schema** automatically - May generate incorrect SQL for **complex nested queries or multi-table joins** - Not suitable for **non-relational (NoSQL) databases** --- ## Bias, Risks, and Limitations - The model may not **always generate valid SQL** for **custom database schemas** - Assumes **consistent column naming**, which may not always be the case in enterprise databases - Performance depends on **how well the input query aligns** with the training data format ### Recommendations - Always **validate generated SQL** before executing on a live database - Use **schema-aware** validation methods for production environments - Consider **fine-tuning the model** on domain-specific SQL queries --- ## How to Get Started with the Model Use the code below to generate SQL queries from natural language: ```python from transformers import T5Tokenizer, T5ForConditionalGeneration # Load model and tokenizer model_name = "ThotaBhanu/t5_sql_askdb" tokenizer = T5Tokenizer.from_pretrained(model_name) model = T5ForConditionalGeneration.from_pretrained(model_name) # Function to convert query to SQL def generate_sql(query): input_text = f"Convert to SQL: {query}" inputs = tokenizer(input_text, return_tensors="pt") output = model.generate(**inputs) return tokenizer.decode(output[0], skip_special_tokens=True) # Example usage query = "Find all employees who joined in 2020" sql_query = generate_sql(query) print(f"📝 Query: {query}") print(f"🛠 Generated SQL: {sql_query}") ## Training Details ### Training Data Dataset: WikiSQL Size: 80,654 pairs of natural language questions and SQL queries Preprocessing: Tokenization using T5Tokenizer, max length 128 ### Training Procedure Training framework: Hugging Face Transformers + PyTorch Hardware used: NVIDIA V100 GPU Optimizer: AdamW Learning rate: 5e-5 Batch size: 8 Epochs: 5 #### Training Hyperparameters Training precision: Mixed precision (fp16) Gradient accumulation: Yes (to handle large batch sizes) #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] @misc{t5_sql_askdb, author = {Bhanu Prasad Thota}, title = {T5-SQL AskDB Model}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/ThotaBhanu/t5_sql_askdb}} } **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]