sourabhd commited on
Commit
42c6c72
1 Parent(s): 4c4761e

Add model card

Browse files
Files changed (1) hide show
  1. README.md +216 -0
README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - text-to-sql
8
+ - text2sql
9
+ - nlp2sql
10
+ - nlp-to-sql
11
+ - SQL
12
+ ---
13
+ # Model Card for text2sql
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+ LLM instruction finetuned for Text-to-SQL task.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ <!-- Provide a longer summary of what this model is. -->
24
+
25
+ - **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
26
+ - **Funded by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
27
+ - **Shared by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/)
28
+ - **Model type:** LlamaForCausalLM
29
+ - **Language(s) (NLP):** English
30
+ - **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at [email protected]
31
+ - **Finetuned from model :** [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)
32
+
33
+
34
+ ## Uses
35
+
36
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
37
+
38
+ ### Direct Use
39
+
40
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
41
+ Model can be used a tool to convert queries in expressed in natural language (English) to SQL statements
42
+
43
+
44
+ ### Downstream Use
45
+
46
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
47
+ The model could be used as the initial stage in a data analytics / business intelligence application pipeline.
48
+
49
+
50
+ ### Out-of-Scope Use
51
+
52
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
53
+
54
+ Model has been fine tuned on a specific task of converting English language statements to SQL queries.
55
+ Any use beyond this is not guaranteed to be accurate.
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ - **Bias:** Trained for English language only.
62
+ - **Risk:** Guardrails are reliant on the base models CodeLlama (Llama2). Finetuning could impact this behaviour.
63
+ - **Limitations:** Intended to be a small model optimised for inference. Does not provide SoTA results on accuracy.
64
+
65
+
66
+ ## How to Get Started with the Model
67
+
68
+ Use the code below to get started with the model.
69
+
70
+ ```
71
+ import torch
72
+ from transformers import AutoModelForCausalLM, AutoTokenizer
73
+
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ "dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1",
76
+ torch_dtype=torch.bfloat16,
77
+ device_map='auto'
78
+ )
79
+
80
+ tokenizer = AutoTokenizer.from_pretrained("dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1")
81
+ # print("model device :", model.device)
82
+ tokenizer.pad_token = tokenizer.eos_token
83
+ model.eval()
84
+
85
+ prompt = """ Below are sql tables schemas paired with instruction that describes a task.
86
+ Using valid SQLite, write a response that appropriately completes the request for the provided tables.
87
+ ### Instruction: How many transactions were made by a customer in a specific month?
88
+ ### Database: RewardsProgramDB61
89
+ ### Input:
90
+ CREATE SCHEMA RewardsProgram;
91
+
92
+ CREATE TABLE Customer (
93
+ CustomerID INT NOT NULL AUTO_INCREMENT,
94
+ FirstName VARCHAR(50) NOT NULL,
95
+ LastName VARCHAR(50) NOT NULL,
96
+ Email VARCHAR(100) UNIQUE NOT NULL,
97
+ Phone VARCHAR(20) UNIQUE,
98
+ DateOfBirth DATE,
99
+ PRIMARY KEY (CustomerID)
100
+ );
101
+
102
+ CREATE TABLE Membership (
103
+ MembershipID INT NOT NULL AUTO_INCREMENT,
104
+ MembershipType VARCHAR(50) NOT NULL,
105
+ DiscountPercentage DECIMAL(5, 2) NOT NULL,
106
+ ValidFrom DATETIME,
107
+ ValidTo DATETIME,
108
+ CustomerID INT NOT NULL,
109
+ PRIMARY KEY (MembershipID),
110
+ FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
111
+ );
112
+
113
+ CREATE TABLE Transaction (
114
+ TransactionID INT NOT NULL AUTO_INCREMENT,
115
+ TransactionDate TIMESTAMP,
116
+ TotalAmount DECIMAL(10, 2) NOT NULL,
117
+ CustomerID INT NOT NULL,
118
+ PRIMARY KEY (TransactionID),
119
+ FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
120
+ );
121
+
122
+ CREATE TABLE TransactionDetail (
123
+ TransactionDetailID INT NOT NULL AUTO_INCREMENT,
124
+ TransactionID INT NOT NULL,
125
+ ProductID INT NOT NULL,
126
+ Quantity INT NOT NULL,
127
+ UnitPrice DECIMAL(10, 2) NOT NULL,
128
+ PRIMARY KEY (TransactionDetailID),
129
+ FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID),
130
+ FOREIGN KEY (ProductID) REFERENCES Product(ProductID)
131
+ );
132
+
133
+ CREATE TABLE Product (
134
+ ProductID INT NOT NULL AUTO_INCREMENT,
135
+ ProductName VARCHAR(100) NOT NULL,
136
+ UnitPrice DECIMAL(10, 2) NOT NULL,
137
+ AvailableQuantity INT NOT NULL,
138
+ CreatedDate DATETIME,
139
+ PRIMARY KEY (ProductID)
140
+ );
141
+
142
+ ALTER TABLE Membership ADD CONSTRAINT FK_Membership_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID);
143
+
144
+ ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Transaction FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID);
145
+
146
+ ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID);"
147
+ """
148
+
149
+ input_ids = tokenizer(prompt, padding=True, return_tensors='pt')
150
+ outputs = model.generate(
151
+ input_ids=input_ids['input_ids'].to(model.device),
152
+ attention_mask=input_ids['attention_mask'].to(model.device),
153
+ max_new_tokens=3072,
154
+ )
155
+
156
+ generated_query = tokenizer.decode(outputs[0], skip_special_tokens=True)
157
+ print(generated_query)
158
+
159
+
160
+ ```
161
+
162
+
163
+ ## Evaluation
164
+
165
+ <!-- This section describes the evaluation protocols and provides the results. -->
166
+
167
+ ### Testing Data & Metrics
168
+
169
+ #### Testing Data
170
+
171
+ <!-- This should link to a Dataset Card if possible. -->
172
+
173
+ [SPIDER dataset Test Set](https://yale-lily.github.io/spider)
174
+
175
+
176
+ #### Metrics
177
+
178
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
179
+
180
+ SQL queries are matched against the correct answer, with two types of evaluation
181
+ * Execution with Values
182
+ * Exact Set Match without Values
183
+
184
+ ### Results
185
+
186
+ ```
187
+ model-index:
188
+ - name: dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql
189
+ results:
190
+ - task:
191
+ type: text-to-sql
192
+ dataset:
193
+ name: SPIDER 1.0
194
+ type: text-to-sql
195
+ metrics:
196
+ - name: Execution with Values
197
+ type: Execution with Values
198
+ value: 64.3
199
+ - name: Exact Set Match without Values
200
+ type: Exact Set Match without Values
201
+ value: 29.6
202
+ source:
203
+ name: Spider 1.0 - Leaderboard
204
+ url: https://yale-lily.github.io/spider
205
+ ```
206
+
207
+
208
+ ## Model Card Authors
209
+
210
+ * Suyash Chougule
211
+ * Chittaranjan Rathod
212
+ * Sourabh Daptardar
213
+
214
+ ## Model Card Contact
215
+
216
+ "dataeaze systems" <[email protected]>