julioc-p commited on
Commit
5cc2d1b
·
verified ·
1 Parent(s): f46c824

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +202 -157
README.md CHANGED
@@ -1,202 +1,247 @@
1
  ---
2
  base_model: mistralai/Mistral-7B-Instruct-v0.1
3
  library_name: peft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
 
 
 
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ### Training Data
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
 
93
  #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
  ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
 
121
  #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
 
127
  ### Results
128
 
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
 
 
 
 
 
 
140
 
141
  ## Environmental Impact
 
 
 
 
 
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
 
159
  ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
-
163
  #### Hardware
164
-
165
- [More Information Needed]
166
 
167
  #### Software
 
 
168
 
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
  ### Framework versions
201
-
202
- - PEFT 0.13.2
 
 
 
 
1
  ---
2
  base_model: mistralai/Mistral-7B-Instruct-v0.1
3
  library_name: peft
4
+ license: mit
5
+ datasets:
6
+ - julioc-p/Question-Sparql
7
+ language:
8
+ - de
9
+ - en
10
+ metrics:
11
+ - f1
12
+ - precision
13
+ - recall
14
+ tags:
15
+ - code
16
+ - text-to-sparql
17
+ - sparql
18
+ - wikidata
19
+ - german
20
  ---
21
 
22
+ This model is a fine-tuned version of `mistralai/Mistral-7B-Instruct-v0.1` for generating SPARQL queries from German natural language questions, specifically targeting the Wikidata knowledge graph.
 
 
 
 
23
 
24
  ## Model Details
25
 
26
  ### Model Description
27
 
28
+ It was fine-tuned using QLoRA. It takes a German natural language question as input and aims to produce a corresponding SPARQL query that can be executed against the Wikidata knowledge graph. It is part of a series of experiments to investigate the impact of continual multilingual pre-training on cross-lingual transferability and task-specific performance. Uses 4-bit quantization.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ - **Developed by:** Julio Cesar Perez Duran
31
+ - **Funded by :** DFKI
32
+ - **Model type:** Decoder-only Transformer-based language model
33
+ - **Language(s) (NLP):** de (German)
34
+ - **License:** mit
35
+ - **Finetuned from model [optional]:** `mistralai/Mistral-7B-Instruct-v0.1`
36
 
37
  ## Bias, Risks, and Limitations
38
 
39
+ - **Entity/Relationship Linking Bottleneck:** A primary limitation of this model is a significant deficiency in accurately mapping textual entities and relationships in German to their correct Wikidata identifiers (QIDs and PIDs) without explicit contextual aid. While the model might generate structurally valid SPARQL, the entities or properties could be incorrect. This significantly impacted recall.
 
 
 
 
 
 
 
 
 
40
  ## How to Get Started with the Model
41
 
42
+ The following Python script provides an example of how to load the model and tokenizer using the Hugging Face Transformers and PEFT libraries to generate a SPARQL query. This script aligns with the generation script you provided.
43
+
44
+ ```python
45
+ import torch
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
47
+ import re
48
+
49
+ model_id = "julioc-p/mistral_de_txt_sparql_4bit"
50
+ base_model_for_tokenizer = "mistralai/Mistral-7B-Instruct-v0.1"
51
+
52
+ # Configuration for 4-bit quantization
53
+ bnb_config = BitsAndBytesConfig(
54
+ load_in_4bit=True,
55
+ bnb_4bit_quant_type="nf4",
56
+ bnb_4bit_compute_dtype=torch.float16,
57
+ bnb_4bit_use_double_quant=False,
58
+ )
59
+
60
+ model = AutoModelForCausalLM.from_pretrained(
61
+ model_id,
62
+ quantization_config=bnb_config,
63
+ device_map="auto" # "cuda" in your script, "auto" is generally more flexible
64
+ )
65
+ tokenizer = AutoTokenizer.from_pretrained(base_model_for_tokenizer)
66
+
67
+ if tokenizer.pad_token is None:
68
+ tokenizer.pad_token = tokenizer.eos_token
69
+ model.config.pad_token_id = tokenizer.pad_token_id
70
+
71
+
72
+ sparql_pattern_strict = re.compile(
73
+ r"""
74
+ (SELECT|ASK|CONSTRUCT|DESCRIBE) # Match SPARQL query type
75
+ .*? # Match any characters (non-greedy)
76
+ \} # Match the first closing curly brace
77
+ ( # Start of optional block for trailing clauses
78
+ (?: # Non-capturing group for one or more trailing clauses
79
+ \s* # Match any whitespace
80
+ (?: # Non-capturing group for specific clauses
81
+ (?:(?:GROUP|ORDER)\s+BY|HAVING)\s+.+?\s*(?=\s*(?:(?:GROUP|ORDER)\s+BY|HAVING|LIMIT|OFFSET|VALUES|$)) | # GROUP BY, ORDER BY, HAVING
82
+ LIMIT\s+\d+ | # LIMIT clause
83
+ OFFSET\s+\d+ | # OFFSET clause
84
+ VALUES\s*(?:\{.*?\}|\w+|\(.*?\)) # VALUES clause
85
+ )
86
+ )* # Match zero or more trailing clauses
87
+ )
88
+ """,
89
+ re.DOTALL | re.IGNORECASE | re.VERBOSE,
90
+ )
91
+
92
+ def extract_sparql(text):
93
+ code_block_match = re.search(
94
+ r"```(?:sparql)?\s*(.*?)\s*```", text, re.DOTALL | re.IGNORECASE
95
+ )
96
+ if code_block_match:
97
+ text_to_search = code_block_match.group(1)
98
+ else:
99
+ text_to_search = text
100
+
101
+ match = sparql_pattern_strict.search(text_to_search)
102
+ if match:
103
+ return match.group(0).strip()
104
+ else:
105
+ # Fallback to simpler regex if strict pattern doesn't match
106
+ fallback_match = re.search(
107
+ r"(SELECT|ASK|CONSTRUCT|DESCRIBE).*?\}",
108
+ text_to_search,
109
+ re.DOTALL | re.IGNORECASE,
110
+ )
111
+ if fallback_match:
112
+ return fallback_match.group(0).strip()
113
+ return ""
114
+
115
+ # --- Example usage ---
116
+ question = "Was ist der Siedepunkt von Wasser?"
117
+ knowledge_graph_target = "Wikidata"
118
+
119
+ prompt_content = f"Write a SparQL query that answers this request: '{question}' from the knowledge graph {knowledge_graph_target}."
120
+
121
+ chat_template = [
122
+ {"role": "user", "content": prompt_content},
123
+ ]
124
+
125
+ inputs = tokenizer.apply_chat_template(
126
+ chat_template,
127
+ tokenize=True,
128
+ add_generation_prompt=True,
129
+ return_tensors="pt"
130
+ ).to(model.device)
131
+
132
+ # Generate the output
133
+ with torch.no_grad():
134
+ outputs = model.generate(
135
+ input_ids=inputs,
136
+ max_new_tokens=512,
137
+ do_sample=True,
138
+ pad_token_id=tokenizer.pad_token_id
139
+ )
140
+
141
+ generated_text_assistant_part = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
142
+ cleaned_sparql = extract_sparql(generated_text_assistant_part)
143
+
144
+ print(f"Frage: {question}")
145
+ print(f"Generierte SPARQL: {cleaned_sparql}")
146
+ print(f"Rohe generierte Textausgabe (Assistent): {generated_text_assistant_part}")
147
+ ```
148
 
149
  ### Training Data
150
 
151
+ The model was fine-tuned on a subset of the `julioc-p/Question-Sparql` dataset. Specifically, for the v1.1 Mistral German model, a 35,000-sample German subset was used.
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  #### Training Hyperparameters
154
 
155
+ The following hyperparameters were used for the fine-tuning:
156
+ - **LoRA Configuration (for Mistral v1.1):**
157
+ - `r` (LoRA rank): 16 (Adjusted from 64 for Mistral due to stability, as per thesis)
158
+ - `lora_alpha`: 16 (Maintained from initial v1 setup, or potentially adjusted with r)
159
+ - `lora_dropout`: 0.1
160
+ - `bias`: "none"
161
+ - `task_type`: "CAUSAL_LM"
162
+ - `target_modules`: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" (Note: `lm_head` was removed for Mistral v1.1, as per thesis page 39)
163
+ - **Training Arguments:**
164
+ - `num_train_epochs`: 5
165
+ - `per_device_train_batch_size`: 1
166
+ - `gradient_accumulation_steps`: 8
167
+ - `gradient_checkpointing`: True
168
+ - `optim`: "paged_adamw_32bit"
169
+ - `learning_rate`: 1e-5
170
+ - `weight_decay`: 0.05
171
+ - `bf16`: False
172
+ - `fp16`: True
173
+ - `max_grad_norm`: 1.0
174
+ - `warmup_ratio`: 0.01
175
+ - `lr_scheduler_type`: "cosine"
176
+ - `group_by_length`: True
177
+ - `packing`: False
178
+ - **BitsAndBytesConfig:**
179
+ - `load_in_4bit`: True
180
+ - `bnb_4bit_quant_type`: "nf4"
181
+ - `bnb_4bit_compute_dtype`: `torch.float16`
182
+ - `bnb_4bit_use_double_quant`: False
183
+
184
+ #### Speeds, Sizes, Times
185
+ - The training took approximately 19-20 hours for 5 epochs on a single NVIDIA V100 GPU.
186
 
187
  ## Evaluation
188
 
 
 
189
  ### Testing Data, Factors & Metrics
190
 
191
  #### Testing Data
192
+ 1. **QALD-10 test set (German):** Standardized benchmark with German questions targeting Wikidata. 391 German questions were attempted after filtering.
193
+ 2. **v1 Test Set (German):** 3,500 German held-out examples randomly sampled from the `julioc-p/Question-Sparql` dataset (Wikidata-focused).
 
 
 
 
 
 
 
 
194
 
195
  #### Metrics
196
+ The primary evaluation metrics used were the QALD standard macro-averaged F1-score, Precision, and Recall. Non-executable queries resulted in P, R, F1 = 0. The percentage of **Executable Queries** was also tracked.
 
 
 
197
 
198
  ### Results
199
 
200
+ **On QALD-10 (German, N=391):**
201
+ - **Macro F1-Score:** 0.0563
202
+ - **Macro Precision:** 0.6726
203
+ - **Macro Recall:** 0.0563
204
+ - **Executable Queries:** 94.88% (371/391)
205
+ - **Correctness (Exact Match + Both Empty):** 5.63% (22/391)
206
+ - Correct (Exact Match): 4.60% (18/391)
207
+ - Correct (Both Empty): 1.02% (4/391)
208
+
209
+ **On v1 Test Set (German, N=3500):**
210
+ - **Macro F1-Score:** 0.1003
211
+ - **Macro Precision:** 0.7481
212
+ - **Macro Recall:** 0.1006
213
+ - **Executable Queries:** 89.11% (3119/3500)
214
+ - **Correctness (Exact Match + Both Empty):** 9.97% (349/3500)
215
+ - Correct (Exact Match): 2.51% (88/3500)
216
+ - Correct (Both Empty): 7.46% (261/3500)
217
 
218
  ## Environmental Impact
219
+ - **Hardware Type:** 1 x NVIDIA V100 32GB GPU
220
+ - **Hours used:** Approx. 19-20 hours for fine-tuning.
221
+ - **Cloud Provider:** DFKI HPC Cluster
222
+ - **Compute Region:** Germany
223
+ - **Carbon Emitted:** Approx. 2.96 kg CO2eq.
224
 
225
+ ## Technical Specifications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
 
227
  ### Compute Infrastructure
228
 
 
 
229
  #### Hardware
230
+ - NVIDIA V100 GPU (32 GB RAM)
231
+ - Approx. 60 GB system RAM
232
 
233
  #### Software
234
+ - Slurm, NVIDIA Enroot, CUDA 11.8.0
235
+ - Python, Hugging Face `transformers`, `peft` (0.13.2), `bitsandbytes`, `trl`, PyTorch.
236
 
237
+ ## More Information
238
+ - **Thesis GitHub:** [https://github.com/julioc-p/cross-lingual-transferability-thesis](https://github.com/julioc-p/cross-lingual-transferability-thesis)
239
+ - **Dataset:** [https://huggingface.co/datasets/julioc-p/Question-Sparql](https://huggingface.co/datasets/julioc-p/Question-Sparql)
240
+ - **Model Link:** [https://huggingface.co/julioc-p/mistral_de_txt_sparql_4bit](https://huggingface.co/julioc-p/mistral_de_txt_sparql_4bit)
 
 
 
 
 
 
 
 
 
 
 
241
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  ### Framework versions
243
+ - PEFT 0.13.2
244
+ - Transformers (`4.39.3`)
245
+ - BitsAndBytes (`0.43.0`)
246
+ - trl (`0.8.6`)
247
+ - PyTorch (`torch==2.1.0`)