Llama-3-8B-Instruct QED Few-Shot (Both Prompts)
Model Description
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct for the QED (Question-Explanation-Data) task.
It was trained using a few-shot approach with both demonstration examples ("Life of Pi" and "Acute hemolytic reaction") included in the prompt, following the QED instruction format.
- Base model: Meta-Llama-3-8B-Instruct
- Fine-tuning method: LoRA (QLoRA, 4-bit)
- Task: Extracting short answers, supporting sentences, and referential equalities from text passages given a question.
Intended Uses & Limitations
- Intended use: Research on explainable QA, entity and span extraction, and referential reasoning.
- Not intended for: General open-domain QA, medical or legal advice, or production deployment without further validation.
Training Data
- Dataset: QED (Question-Explanation-Data) dataset
- Prompt format: Each input includes a title, question, and context passage, with the following instruction and two demonstration examples.
Prompt Format
The model expects prompts in the following format (using Llama-3-Instruct tokens):
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an expert at extracting answers and structured explanations from text.
Your response MUST be **valid JSON only** (no extra commentary).
Task
====
Given:
• a **title** for the passage,
• a **question** about the passage, and
• the **context passage** itself,
produce an explanation object with three parts:
1. "answer" – the **shortest span** from the passage that fully answers the question.
2. "selected_sentence" – the **single sentence** in the passage that entails or implies the answer.
3. "referential_equalities" – a list of mappings between phrases in the question and phrases in the selected sentence that refer to the **same real-world entity/event**.
• Each mapping has two keys:
- "question_reference": the exact phrase from the question (**must be a contiguous substring from the question, not from the context or title**).
- "sentence_reference": the exact phrase from the selected sentence (**must be a contiguous substring from the selected sentence, not from the question or title**), or "" (empty string if the entire sentence is the referent).
▸ Use **""** for "sentence_reference" when the entity/event is not named by any specific phrase in the sentence – i.e. the entire sentence acts as the referent (a *bridge* to the whole sentence).
This corresponds to the (start = end = -1) convention in the QED dataset.
Output format
=============
Return **only** JSON in this exact schema:
{
"answer": "<string from passage>",
"selected_sentence": "<string from passage>",
"referential_equalities": [
{
"question_reference": "<string from question only>",
"sentence_reference": "<string from selected_sentence only, or "">",
"bridge": "<false if not a bridge; otherwise, a string explaining the bridge connection, e.g., 'in', 'for', 'of', 'at', 'on'>"
}
...
]
}
Demonstration Example 1:
Title:
Life of Pi
Question:
what is the tigers name in life of pi
Context:
Life of Pi is a Canadian fantasy adventure novel by Yann Martel published in 2001 . The protagonist is Piscine Molitor `` Pi '' Patel , an Indian boy from Pondicherry who explores issues of spirituality and practicality from an early age . He survives 227 days after a shipwreck while stranded on a lifeboat in the Pacific Ocean with a Bengal tiger named Richard Parker .
Expected JSON:
{
"answer": "Richard Parker",
"selected_sentence": "He survives 227 days after a shipwreck while stranded on a lifeboat in the Pacific Ocean with a Bengal tiger named Richard Parker .",
"referential_equalities": [
{
"question_reference": "the tiger",
"sentence_reference": "a Bengal tiger",
"bridge": false
},
{
"question_reference": "life of pi",
"sentence_reference": "",
"bridge": "in"
}
]
}
Demonstration Example 2:
Title:
Acute hemolytic transfusion reaction
Question:
what happens to the rbc in acute hemolytic reaction
Context:
It is also known as an `` immediate hemolytic transfusion reaction '' . This is a medical emergency as it results from rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM ) . It is usually related to ABO blood group incompatibility - the most severe of which often involves group A red cells being given to a patient with group O type blood . Properdin then binds to complement C3 in the donor blood , facilitating the reaction through the alternate pathway cascade . The donor cells also become coated with IgG and are subsequently removed by macrophages in the reticuloendothelial system ( RES ) . Jaundice and disseminated intravascular coagulation ( DIC ) may also occur . The most common cause is clerical error ( i.e. the wrong unit of blood being given to the patient ) .
Expected JSON:
{
"answer": "rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM )",
"selected_sentence": "This is a medical emergency as it results from rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM ) .",
"referential_equalities": [
{
"question_reference": "acute hemolytic reaction",
"sentence_reference": "This",
"bridge": false
},
{
"question_reference": "the rbc",
"sentence_reference": "the donor red blood cells",
"bridge": false
}
]
}
<|eot_id|><|start_header_id|>user<|end_header_id|>
Title: {title}
Question: {question}
Context: {context}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Training Hyperparameters
- Model: meta-llama/Meta-Llama-3-8B-Instruct
- LoRA: enabled (
lora_r=32
,lora_alpha=64
,lora_dropout=0.05
) - Quantization: 4-bit (QLoRA), CPU offload enabled
- Epochs: 1
- Batch size: 1 (gradient accumulation steps: 16)
- Learning rate: 2e-5
- Weight decay: 0.001
- Warmup ratio: 0.1
- Optimizer: paged_adamw_8bit
- Precision: bf16
- Max source length: 3072
- Max target length: 1024
- Prompt examples: both (see above)
- Output dir:
models_fine_tuned/llama3_8b_instruct_fewshot_both
Evaluation Results
Evaluated on 998 validation examples, using official QED metrics at various F1 overlap thresholds (non-strict):
Overlap | Answer Accuracy | All Mention F1 | Pair F1 |
---|---|---|---|
0.50 | 82.4% | 19.6% | 10.4% |
0.60 | 74.2% | 19.5% | 10.3% |
0.70 | 68.2% | 19.5% | 10.3% |
0.80 | 63.2% | 19.5% | 10.3% |
0.90 | 59.8% | 19.2% | 10.0% |
Limitations & Ethical Considerations
- The model is trained on a specific dataset and task; it may not generalize to other domains.
- Outputs are not guaranteed to be factually correct or safe for critical applications.
- Always validate outputs before use in downstream tasks.
Citation
If you use this model or code, please cite the original Llama-3 paper and your own work as appropriate.
Author
- Denis Rize
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for DenisRz/llama3_8b_instruct_qed
Base model
meta-llama/Meta-Llama-3-8B-Instruct