Model Description
This model is a fine-tuned version of unsloth/Meta-Llama-3.1-8B
optimized for Text-to-SQL generation tasks. The fine-tuning was done using the Unsloth library with LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. The training data consists of the first 5000 rows of the Clinton/Text-to-sql-v1 dataset.
- Developed by: Vedant Rajpurohit
- Model type: Causal Language Model
- Language(s): English
- Fine-tuned from model:
unsloth/Meta-Llama-3.1-8B
- Model size: 8.03B parameters
- Precision: BF16
Direct Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the model and tokenizer from the Hugging Face Hub
model_name = "Vedant3907/Text-to-Sql-llama3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)
model.eval()
# Define your test prompt
sql_prompt = """Below are SQL table schemas paired with instruction that describes a task.
Using valid SQLite, write a response that appropriately completes the request for the provided tables.
### Instruction: What is the 2007 result when the 2010 result was 2r, at the US Open?
### Input: CREATE TABLE table_name_91 ( tournament VARCHAR )
### Response:"""
# Tokenize input
inputs = tokenizer(sql_prompt, return_tensors="pt").to("cuda")
# Generate SQL query
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=True, # Use sampling for more diverse outputs
)
# Decode and print the generated output
generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated SQL Query:")
print(generated_sql)
#SELECT 2007 FROM table_name_91 WHERE 2010 = "2r" AND tournament = "us open"
Bias, Risks, and Limitations
- The model was only trained on first 5000 rows for 250 steps.
- The model may generate incorrect or ambiguous SQL queries for instructions that are unclear or outside the training distribution.
Training Details
Dataset
- Dataset Name:
Clinton/Text-to-sql-v1
- Rows Used: First 5000 rows of the dataset.
Training Procedure
The model was fine-tuned using the Unsloth library with LoRA adapters, enabling efficient training. Below are the hyperparameters used:
TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10, # 4% of 250 steps
max_steps = 250,
learning_rate = 1e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 10,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "cosine",
seed = 3407,
output_dir = "outputs",
report_to = "none"
)
Hardware
- Trained on google colab with its T4 GPU
- Downloads last month
- 350
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.