Update README.md
Browse files
README.md
CHANGED
@@ -1,202 +1,247 @@
|
|
1 |
---
|
2 |
base_model: mistralai/Mistral-7B-Instruct-v0.1
|
3 |
library_name: peft
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
-
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
### Model Description
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
- **Developed by:** [More Information Needed]
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
-
|
28 |
-
### Model Sources [optional]
|
29 |
-
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
-
## Uses
|
37 |
-
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
-
|
52 |
-
### Out-of-Scope Use
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
|
|
|
|
|
|
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
## How to Get Started with the Model
|
71 |
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
|
93 |
#### Training Hyperparameters
|
94 |
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
## Evaluation
|
104 |
|
105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
-
|
107 |
### Testing Data, Factors & Metrics
|
108 |
|
109 |
#### Testing Data
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
|
121 |
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
|
127 |
### Results
|
128 |
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
140 |
|
141 |
## Environmental Impact
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
-
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
|
159 |
### Compute Infrastructure
|
160 |
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
#### Hardware
|
164 |
-
|
165 |
-
|
166 |
|
167 |
#### Software
|
|
|
|
|
168 |
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
200 |
### Framework versions
|
201 |
-
|
202 |
-
-
|
|
|
|
|
|
|
|
1 |
---
|
2 |
base_model: mistralai/Mistral-7B-Instruct-v0.1
|
3 |
library_name: peft
|
4 |
+
license: mit
|
5 |
+
datasets:
|
6 |
+
- julioc-p/Question-Sparql
|
7 |
+
language:
|
8 |
+
- de
|
9 |
+
- en
|
10 |
+
metrics:
|
11 |
+
- f1
|
12 |
+
- precision
|
13 |
+
- recall
|
14 |
+
tags:
|
15 |
+
- code
|
16 |
+
- text-to-sparql
|
17 |
+
- sparql
|
18 |
+
- wikidata
|
19 |
+
- german
|
20 |
---
|
21 |
|
22 |
+
This model is a fine-tuned version of `mistralai/Mistral-7B-Instruct-v0.1` for generating SPARQL queries from German natural language questions, specifically targeting the Wikidata knowledge graph.
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Model Details
|
25 |
|
26 |
### Model Description
|
27 |
|
28 |
+
It was fine-tuned using QLoRA. It takes a German natural language question as input and aims to produce a corresponding SPARQL query that can be executed against the Wikidata knowledge graph. It is part of a series of experiments to investigate the impact of continual multilingual pre-training on cross-lingual transferability and task-specific performance. Uses 4-bit quantization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
- **Developed by:** Julio Cesar Perez Duran
|
31 |
+
- **Funded by :** DFKI
|
32 |
+
- **Model type:** Decoder-only Transformer-based language model
|
33 |
+
- **Language(s) (NLP):** de (German)
|
34 |
+
- **License:** mit
|
35 |
+
- **Finetuned from model [optional]:** `mistralai/Mistral-7B-Instruct-v0.1`
|
36 |
|
37 |
## Bias, Risks, and Limitations
|
38 |
|
39 |
+
- **Entity/Relationship Linking Bottleneck:** A primary limitation of this model is a significant deficiency in accurately mapping textual entities and relationships in German to their correct Wikidata identifiers (QIDs and PIDs) without explicit contextual aid. While the model might generate structurally valid SPARQL, the entities or properties could be incorrect. This significantly impacted recall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
## How to Get Started with the Model
|
41 |
|
42 |
+
The following Python script provides an example of how to load the model and tokenizer using the Hugging Face Transformers and PEFT libraries to generate a SPARQL query. This script aligns with the generation script you provided.
|
43 |
+
|
44 |
+
```python
|
45 |
+
import torch
|
46 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
47 |
+
import re
|
48 |
+
|
49 |
+
model_id = "julioc-p/mistral_de_txt_sparql_4bit"
|
50 |
+
base_model_for_tokenizer = "mistralai/Mistral-7B-Instruct-v0.1"
|
51 |
+
|
52 |
+
# Configuration for 4-bit quantization
|
53 |
+
bnb_config = BitsAndBytesConfig(
|
54 |
+
load_in_4bit=True,
|
55 |
+
bnb_4bit_quant_type="nf4",
|
56 |
+
bnb_4bit_compute_dtype=torch.float16,
|
57 |
+
bnb_4bit_use_double_quant=False,
|
58 |
+
)
|
59 |
+
|
60 |
+
model = AutoModelForCausalLM.from_pretrained(
|
61 |
+
model_id,
|
62 |
+
quantization_config=bnb_config,
|
63 |
+
device_map="auto" # "cuda" in your script, "auto" is generally more flexible
|
64 |
+
)
|
65 |
+
tokenizer = AutoTokenizer.from_pretrained(base_model_for_tokenizer)
|
66 |
+
|
67 |
+
if tokenizer.pad_token is None:
|
68 |
+
tokenizer.pad_token = tokenizer.eos_token
|
69 |
+
model.config.pad_token_id = tokenizer.pad_token_id
|
70 |
+
|
71 |
+
|
72 |
+
sparql_pattern_strict = re.compile(
|
73 |
+
r"""
|
74 |
+
(SELECT|ASK|CONSTRUCT|DESCRIBE) # Match SPARQL query type
|
75 |
+
.*? # Match any characters (non-greedy)
|
76 |
+
\} # Match the first closing curly brace
|
77 |
+
( # Start of optional block for trailing clauses
|
78 |
+
(?: # Non-capturing group for one or more trailing clauses
|
79 |
+
\s* # Match any whitespace
|
80 |
+
(?: # Non-capturing group for specific clauses
|
81 |
+
(?:(?:GROUP|ORDER)\s+BY|HAVING)\s+.+?\s*(?=\s*(?:(?:GROUP|ORDER)\s+BY|HAVING|LIMIT|OFFSET|VALUES|$)) | # GROUP BY, ORDER BY, HAVING
|
82 |
+
LIMIT\s+\d+ | # LIMIT clause
|
83 |
+
OFFSET\s+\d+ | # OFFSET clause
|
84 |
+
VALUES\s*(?:\{.*?\}|\w+|\(.*?\)) # VALUES clause
|
85 |
+
)
|
86 |
+
)* # Match zero or more trailing clauses
|
87 |
+
)
|
88 |
+
""",
|
89 |
+
re.DOTALL | re.IGNORECASE | re.VERBOSE,
|
90 |
+
)
|
91 |
+
|
92 |
+
def extract_sparql(text):
|
93 |
+
code_block_match = re.search(
|
94 |
+
r"```(?:sparql)?\s*(.*?)\s*```", text, re.DOTALL | re.IGNORECASE
|
95 |
+
)
|
96 |
+
if code_block_match:
|
97 |
+
text_to_search = code_block_match.group(1)
|
98 |
+
else:
|
99 |
+
text_to_search = text
|
100 |
+
|
101 |
+
match = sparql_pattern_strict.search(text_to_search)
|
102 |
+
if match:
|
103 |
+
return match.group(0).strip()
|
104 |
+
else:
|
105 |
+
# Fallback to simpler regex if strict pattern doesn't match
|
106 |
+
fallback_match = re.search(
|
107 |
+
r"(SELECT|ASK|CONSTRUCT|DESCRIBE).*?\}",
|
108 |
+
text_to_search,
|
109 |
+
re.DOTALL | re.IGNORECASE,
|
110 |
+
)
|
111 |
+
if fallback_match:
|
112 |
+
return fallback_match.group(0).strip()
|
113 |
+
return ""
|
114 |
+
|
115 |
+
# --- Example usage ---
|
116 |
+
question = "Was ist der Siedepunkt von Wasser?"
|
117 |
+
knowledge_graph_target = "Wikidata"
|
118 |
+
|
119 |
+
prompt_content = f"Write a SparQL query that answers this request: '{question}' from the knowledge graph {knowledge_graph_target}."
|
120 |
+
|
121 |
+
chat_template = [
|
122 |
+
{"role": "user", "content": prompt_content},
|
123 |
+
]
|
124 |
+
|
125 |
+
inputs = tokenizer.apply_chat_template(
|
126 |
+
chat_template,
|
127 |
+
tokenize=True,
|
128 |
+
add_generation_prompt=True,
|
129 |
+
return_tensors="pt"
|
130 |
+
).to(model.device)
|
131 |
+
|
132 |
+
# Generate the output
|
133 |
+
with torch.no_grad():
|
134 |
+
outputs = model.generate(
|
135 |
+
input_ids=inputs,
|
136 |
+
max_new_tokens=512,
|
137 |
+
do_sample=True,
|
138 |
+
pad_token_id=tokenizer.pad_token_id
|
139 |
+
)
|
140 |
+
|
141 |
+
generated_text_assistant_part = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
|
142 |
+
cleaned_sparql = extract_sparql(generated_text_assistant_part)
|
143 |
+
|
144 |
+
print(f"Frage: {question}")
|
145 |
+
print(f"Generierte SPARQL: {cleaned_sparql}")
|
146 |
+
print(f"Rohe generierte Textausgabe (Assistent): {generated_text_assistant_part}")
|
147 |
+
```
|
148 |
|
149 |
### Training Data
|
150 |
|
151 |
+
The model was fine-tuned on a subset of the `julioc-p/Question-Sparql` dataset. Specifically, for the v1.1 Mistral German model, a 35,000-sample German subset was used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
152 |
|
153 |
#### Training Hyperparameters
|
154 |
|
155 |
+
The following hyperparameters were used for the fine-tuning:
|
156 |
+
- **LoRA Configuration (for Mistral v1.1):**
|
157 |
+
- `r` (LoRA rank): 16 (Adjusted from 64 for Mistral due to stability, as per thesis)
|
158 |
+
- `lora_alpha`: 16 (Maintained from initial v1 setup, or potentially adjusted with r)
|
159 |
+
- `lora_dropout`: 0.1
|
160 |
+
- `bias`: "none"
|
161 |
+
- `task_type`: "CAUSAL_LM"
|
162 |
+
- `target_modules`: "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" (Note: `lm_head` was removed for Mistral v1.1, as per thesis page 39)
|
163 |
+
- **Training Arguments:**
|
164 |
+
- `num_train_epochs`: 5
|
165 |
+
- `per_device_train_batch_size`: 1
|
166 |
+
- `gradient_accumulation_steps`: 8
|
167 |
+
- `gradient_checkpointing`: True
|
168 |
+
- `optim`: "paged_adamw_32bit"
|
169 |
+
- `learning_rate`: 1e-5
|
170 |
+
- `weight_decay`: 0.05
|
171 |
+
- `bf16`: False
|
172 |
+
- `fp16`: True
|
173 |
+
- `max_grad_norm`: 1.0
|
174 |
+
- `warmup_ratio`: 0.01
|
175 |
+
- `lr_scheduler_type`: "cosine"
|
176 |
+
- `group_by_length`: True
|
177 |
+
- `packing`: False
|
178 |
+
- **BitsAndBytesConfig:**
|
179 |
+
- `load_in_4bit`: True
|
180 |
+
- `bnb_4bit_quant_type`: "nf4"
|
181 |
+
- `bnb_4bit_compute_dtype`: `torch.float16`
|
182 |
+
- `bnb_4bit_use_double_quant`: False
|
183 |
+
|
184 |
+
#### Speeds, Sizes, Times
|
185 |
+
- The training took approximately 19-20 hours for 5 epochs on a single NVIDIA V100 GPU.
|
186 |
|
187 |
## Evaluation
|
188 |
|
|
|
|
|
189 |
### Testing Data, Factors & Metrics
|
190 |
|
191 |
#### Testing Data
|
192 |
+
1. **QALD-10 test set (German):** Standardized benchmark with German questions targeting Wikidata. 391 German questions were attempted after filtering.
|
193 |
+
2. **v1 Test Set (German):** 3,500 German held-out examples randomly sampled from the `julioc-p/Question-Sparql` dataset (Wikidata-focused).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
194 |
|
195 |
#### Metrics
|
196 |
+
The primary evaluation metrics used were the QALD standard macro-averaged F1-score, Precision, and Recall. Non-executable queries resulted in P, R, F1 = 0. The percentage of **Executable Queries** was also tracked.
|
|
|
|
|
|
|
197 |
|
198 |
### Results
|
199 |
|
200 |
+
**On QALD-10 (German, N=391):**
|
201 |
+
- **Macro F1-Score:** 0.0563
|
202 |
+
- **Macro Precision:** 0.6726
|
203 |
+
- **Macro Recall:** 0.0563
|
204 |
+
- **Executable Queries:** 94.88% (371/391)
|
205 |
+
- **Correctness (Exact Match + Both Empty):** 5.63% (22/391)
|
206 |
+
- Correct (Exact Match): 4.60% (18/391)
|
207 |
+
- Correct (Both Empty): 1.02% (4/391)
|
208 |
+
|
209 |
+
**On v1 Test Set (German, N=3500):**
|
210 |
+
- **Macro F1-Score:** 0.1003
|
211 |
+
- **Macro Precision:** 0.7481
|
212 |
+
- **Macro Recall:** 0.1006
|
213 |
+
- **Executable Queries:** 89.11% (3119/3500)
|
214 |
+
- **Correctness (Exact Match + Both Empty):** 9.97% (349/3500)
|
215 |
+
- Correct (Exact Match): 2.51% (88/3500)
|
216 |
+
- Correct (Both Empty): 7.46% (261/3500)
|
217 |
|
218 |
## Environmental Impact
|
219 |
+
- **Hardware Type:** 1 x NVIDIA V100 32GB GPU
|
220 |
+
- **Hours used:** Approx. 19-20 hours for fine-tuning.
|
221 |
+
- **Cloud Provider:** DFKI HPC Cluster
|
222 |
+
- **Compute Region:** Germany
|
223 |
+
- **Carbon Emitted:** Approx. 2.96 kg CO2eq.
|
224 |
|
225 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
|
227 |
### Compute Infrastructure
|
228 |
|
|
|
|
|
229 |
#### Hardware
|
230 |
+
- NVIDIA V100 GPU (32 GB RAM)
|
231 |
+
- Approx. 60 GB system RAM
|
232 |
|
233 |
#### Software
|
234 |
+
- Slurm, NVIDIA Enroot, CUDA 11.8.0
|
235 |
+
- Python, Hugging Face `transformers`, `peft` (0.13.2), `bitsandbytes`, `trl`, PyTorch.
|
236 |
|
237 |
+
## More Information
|
238 |
+
- **Thesis GitHub:** [https://github.com/julioc-p/cross-lingual-transferability-thesis](https://github.com/julioc-p/cross-lingual-transferability-thesis)
|
239 |
+
- **Dataset:** [https://huggingface.co/datasets/julioc-p/Question-Sparql](https://huggingface.co/datasets/julioc-p/Question-Sparql)
|
240 |
+
- **Model Link:** [https://huggingface.co/julioc-p/mistral_de_txt_sparql_4bit](https://huggingface.co/julioc-p/mistral_de_txt_sparql_4bit)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
241 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
242 |
### Framework versions
|
243 |
+
- PEFT 0.13.2
|
244 |
+
- Transformers (`4.39.3`)
|
245 |
+
- BitsAndBytes (`0.43.0`)
|
246 |
+
- trl (`0.8.6`)
|
247 |
+
- PyTorch (`torch==2.1.0`)
|