Llama-3-8B-Instruct QED Few-Shot (Both Prompts)

Model Description

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct for the QED (Question-Explanation-Data) task.
It was trained using a few-shot approach with both demonstration examples ("Life of Pi" and "Acute hemolytic reaction") included in the prompt, following the QED instruction format.

  • Base model: Meta-Llama-3-8B-Instruct
  • Fine-tuning method: LoRA (QLoRA, 4-bit)
  • Task: Extracting short answers, supporting sentences, and referential equalities from text passages given a question.

Intended Uses & Limitations

  • Intended use: Research on explainable QA, entity and span extraction, and referential reasoning.
  • Not intended for: General open-domain QA, medical or legal advice, or production deployment without further validation.

Training Data

  • Dataset: QED (Question-Explanation-Data) dataset
  • Prompt format: Each input includes a title, question, and context passage, with the following instruction and two demonstration examples.

Prompt Format

The model expects prompts in the following format (using Llama-3-Instruct tokens):

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert at extracting answers and structured explanations from text.
Your response MUST be **valid JSON only** (no extra commentary).

Task
====
Given:
• a **title** for the passage,
• a **question** about the passage, and
• the **context passage** itself,

produce an explanation object with three parts:

1. "answer" – the **shortest span** from the passage that fully answers the question.
2. "selected_sentence" – the **single sentence** in the passage that entails or implies the answer.
3. "referential_equalities" – a list of mappings between phrases in the question and phrases in the selected sentence that refer to the **same real-world entity/event**.

   • Each mapping has two keys:
       - "question_reference": the exact phrase from the question (**must be a contiguous substring from the question, not from the context or title**).
       - "sentence_reference": the exact phrase from the selected sentence (**must be a contiguous substring from the selected sentence, not from the question or title**), or "" (empty string if the entire sentence is the referent).

     ▸ Use **""** for "sentence_reference" when the entity/event is not named by any specific phrase in the sentence – i.e. the entire sentence acts as the referent (a *bridge* to the whole sentence).  
       This corresponds to the (start = end = -1) convention in the QED dataset.

Output format
=============
Return **only** JSON in this exact schema:

{
  "answer": "<string from passage>",
  "selected_sentence": "<string from passage>",
  "referential_equalities": [
    {
      "question_reference": "<string from question only>",
      "sentence_reference": "<string from selected_sentence only, or "">",
      "bridge": "<false if not a bridge; otherwise, a string explaining the bridge connection, e.g., 'in', 'for', 'of', 'at', 'on'>"
    }
    ...
  ]
}

Demonstration Example 1:
Title:
Life of Pi

Question:
what is the tigers name in life of pi

Context:
Life of Pi is a Canadian fantasy adventure novel by Yann Martel published in 2001 . The protagonist is Piscine Molitor `` Pi '' Patel , an Indian boy from Pondicherry who explores issues of spirituality and practicality from an early age . He survives 227 days after a shipwreck while stranded on a lifeboat in the Pacific Ocean with a Bengal tiger named Richard Parker .

Expected JSON:
{
  "answer": "Richard Parker",
  "selected_sentence": "He survives 227 days after a shipwreck while stranded on a lifeboat in the Pacific Ocean with a Bengal tiger named Richard Parker .",
  "referential_equalities": [
    {
      "question_reference": "the tiger",
      "sentence_reference": "a Bengal tiger",
      "bridge": false
    },
    {
      "question_reference": "life of pi",
      "sentence_reference": "",
      "bridge": "in"
    }
  ]
}

Demonstration Example 2:
Title:
Acute hemolytic transfusion reaction

Question:
what happens to the rbc in acute hemolytic reaction

Context:
It is also known as an `` immediate hemolytic transfusion reaction '' . This is a medical emergency as it results from rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM ) . It is usually related to ABO blood group incompatibility - the most severe of which often involves group A red cells being given to a patient with group O type blood . Properdin then binds to complement C3 in the donor blood , facilitating the reaction through the alternate pathway cascade . The donor cells also become coated with IgG and are subsequently removed by macrophages in the reticuloendothelial system ( RES ) . Jaundice and disseminated intravascular coagulation ( DIC ) may also occur . The most common cause is clerical error ( i.e. the wrong unit of blood being given to the patient ) .

Expected JSON:
{
  "answer": "rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM )",
  "selected_sentence": "This is a medical emergency as it results from rapid destruction of the donor red blood cells by host antibodies ( IgG , IgM ) .",
  "referential_equalities": [
    {
      "question_reference": "acute hemolytic reaction",
      "sentence_reference": "This",
      "bridge": false
    },
    {
      "question_reference": "the rbc",
      "sentence_reference": "the donor red blood cells",
      "bridge": false
    }
  ]
}
<|eot_id|><|start_header_id|>user<|end_header_id|>

Title: {title}
Question: {question}
Context: {context}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Training Hyperparameters

  • Model: meta-llama/Meta-Llama-3-8B-Instruct
  • LoRA: enabled (lora_r=32, lora_alpha=64, lora_dropout=0.05)
  • Quantization: 4-bit (QLoRA), CPU offload enabled
  • Epochs: 1
  • Batch size: 1 (gradient accumulation steps: 16)
  • Learning rate: 2e-5
  • Weight decay: 0.001
  • Warmup ratio: 0.1
  • Optimizer: paged_adamw_8bit
  • Precision: bf16
  • Max source length: 3072
  • Max target length: 1024
  • Prompt examples: both (see above)
  • Output dir: models_fine_tuned/llama3_8b_instruct_fewshot_both

Evaluation Results

Evaluated on 998 validation examples, using official QED metrics at various F1 overlap thresholds (non-strict):

Overlap Answer Accuracy All Mention F1 Pair F1
0.50 82.4% 19.6% 10.4%
0.60 74.2% 19.5% 10.3%
0.70 68.2% 19.5% 10.3%
0.80 63.2% 19.5% 10.3%
0.90 59.8% 19.2% 10.0%

Limitations & Ethical Considerations

  • The model is trained on a specific dataset and task; it may not generalize to other domains.
  • Outputs are not guaranteed to be factually correct or safe for critical applications.
  • Always validate outputs before use in downstream tasks.

Citation

If you use this model or code, please cite the original Llama-3 paper and your own work as appropriate.


Author

  • Denis Rize
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DenisRz/llama3_8b_instruct_qed

Adapter
(895)
this model

Dataset used to train DenisRz/llama3_8b_instruct_qed