LoRA Adapter for Hallucination Detection in RAG outputs

Welcome to Granite Experiments!

Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite – we'll keep an eye out for feedback and questions. Happy exploring!

Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.

Model Summary

This is a RAG-specific LoRA adapter for ibm-granite/granite-3.2-8b-instruct that is fine-tuned for the hallucination detection task of model outputs. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a faithulness score range (halluciation risk range) for each sentence in the assistant response.

Developer: IBM Research
Model type: LoRA adapter for ibm-granite/granite-3.2-8b-instruct
License: Apache 2.0

Intended use

This is a LoRA adapter that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.

Note: While you can invoke the LoRA adapter directly, as outlined below, we highly recommend calling it through granite-io, which wraps it with a tailored I/O processor. The I/O processor provides a friendlier interface, as it takes care of various data transformations and validation tasks. This includes among others, splitting the assistant response into sentences before calling the adapter, as well as validating the adapters output and transforming the sentence IDs returned by the adapter into appropriate spans over the the response.

However, if you prefer to invoke the LoRA adapter directly, the expected input/output is described below.

Model input: The input to the model is a list of conversational turns ending with an assistant response and a list documents converted to a string using apply_chat_template function. For the adapter to work, the last assistant response should be pre-split into sentences and sentence indices needs be preprended. In more detail, the primary inputs are the following three items, each represented in JSON:

conversation: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields role and content. The role equals to either user or assistant, denoting user and assistant turns, respectively, while the content field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the text field of that turn should contain the assistant utterance with each sentence prefixed with a response id of the form <rI>, where I is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn.
documents: A list of documents, where each item in the list is a dictionary with fields doc_id and text. The text field contains the text of the corresponding document.

Additionally this LoRA adapter is trained with a task instruction, which is encoded as a dictionary with fields role and content, where role equals to system and content equals to the following string describing the hallucination detection task: Split the last assistant response into individual sentences. For each sentence in the last assistant response, identify the faithfulness score range. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding faithfulness score range. The output must be a json structure.

To prompt the LoRA adapter, we combine the above components as follows: We first append the instruction to the end of the conversation to generate an input_conversation list. Then we invoke the apply_chat_template function with parameters: conversation = augmented_conversation and documents = documents.

Model output: When prompted with the above input, the model generates a range for faithfulness score (hallucination risk) for each sentence of the last assistant response in the form of a JSON dictionary. The dictionary is of the form {"<r0>": "value_0", "<r1>": "value_1", ...}, where each field <rI>, where I an integer, corresponds to the ID of a sentence in the last assistant response and its corresponding value is the range for faithfulness score (hallucination risk) of the sentence. The output values can show numeric ranges between 0-1 with increments of 0.1, where the higher values correponds to high faithfulness (low hallucination risk), and lower values corresponds to low faithfulness (high hallucination risk). Additionally, the model is trained to output unanswerable when the response sentence indicate that the question is not answerable, and to output NA when the faithfulness cannot be determined (ex: very short sentences).

Quickstart Example

As explained above, it is highly recommended to use the LoRA adapter through granite-io [ADD LINK].

However, if you prefer to invoke the LoRA adapter directly, you can use the following code.

import torch
from transformers import AutoTokenizer,  AutoModelForCausalLM
from peft import PeftModel, PeftConfig
from nltk import tokenize
import json

BASE_NAME = "ibm-granite/granite-3.2-8b-instruct"
LORA_NAME = "ibm-granite/granite-3.2-8b-lora-rag-hallucination-detection"
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto")
model_hallucination = PeftModel.from_pretrained(model_base, LORA_NAME)

hallucination_sys_prompt = "Split the last assistant response into individual sentences. For each sentence in the last assistant response, identify the faithfulness score range. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding faithfulness score range. The output must be a json structure."

def format_conversation(conversation):
    response_sents = tokenize.sent_tokenize(conversation[-1]["content"])
    response_sents_with_ids = []
    for ind, sent in enumerate(response_sents):
        response_sents_with_ids.append(f"<r{ind}> {sent}")
    conversation[-1]["content"] = ' '.join(response_sents_with_ids)
    conversation.append({
        "role": "system",
        "content": hallucination_sys_prompt
    })
    return conversation


conversation = [
    {
        "role": "user",
        "content": "What happened to Dennis Wilson of the Beach Boys in 1983?"
    },
    {
        "role": "assistant",
        "content": "Dennis Wilson of the Beach Boys drowned in Marina del Rey on December 28, 1983, while diving from a friend's boat trying to recover items that he had previously thrown overboard in fits of rage. Forensic pathologists believed that Dennis experienced shallow-water blackout just before his death"
    }
]
input_conversation = format_conversation(conversation=conversation)

documents = [
    {
        "doc_id": 1,
        "text": "The Beach Boys are an American rock band formed in Hawthorne, California, in 1961. The group's original lineup consisted of brothers Brian, Dennis, and Carl Wilson; their cousin Mike Love; and their friend Al Jardine. Distinguished by their vocal harmonies and early surf songs, they are one of the most influential acts of the rock era. The band drew on the music of jazz-based vocal groups, 1950s rock and roll, and black R&B to create their unique sound, and with Brian as composer, arranger, producer, and de facto leader, often incorporated classical or jazz elements and unconventional recording techniques in innovative ways. In 1983, tensions between Dennis and Love escalated so high that each obtained a restraining order against each other. With the rest of the band fearing that he would end up like Brian, Dennis was given an ultimatum after his last performance in November 1983 to check into rehab for his alcohol problems or be banned from performing live with them. Dennis checked into rehab for his chance to get sober, but on December 28, 1983, he fatally drowned in Marina del Rey while diving from a friend's boat trying to recover items that he had previously thrown overboard in fits of rage."
    },
    {
        "doc_id": 2,
        "text": "A cigarette smoker since the age of 13, Carl was diagnosed with lung cancer after becoming ill at his vacation home in Hawaii, in early 1997. Despite his illness, Carl continued to perform while undergoing chemotherapy. He played and sang throughout the Beach Boys' entire summer tour which ended in the fall of 1997. During the performances, he sat on a stool, but he stood while singing \"God Only Knows\".  Carl died of lung cancer in Los Angeles, surrounded by his family, on February 6, 1998, just two months after the death of his mother, Audree Wilson. He was interred at Westwood Village Memorial Park Cemetery in Los Angeles."
    },
    {
        "doc_id": 3,
        "text": "Carl Dean Wilson (December 21, 1946 - February 6, 1998) was an American musician, singer, and songwriter who co-founded the Beach Boys. He is best remembered as their lead guitarist, as the youngest brother of bandmates Brian and Dennis Wilson, and as the group's de facto leader in the early 1970s. He was also the band's musical director on stage from 1965 until his death. Influenced by the guitar playing of Chuck Berry and the Ventures, Carl's initial role in the group was that of lead guitarist and backing vocals, but he performed lead vocals on several of their later hits, including \"God Only Knows\" (1966), \"Good Vibrations\" (1966), and \"Kokomo\" (1988). By the early 1980s the Beach Boys were in disarray; the band had split into several camps. Frustrated with the band's sluggishness to record new material and reluctance to rehearse, Wilson took a leave of absence in 1981.  He quickly recorded and released a solo album, Carl Wilson, composed largely of rock n' roll songs co-written with Myrna Smith-Schilling, a former backing vocalist for Elvis Presley and Aretha Franklin, and wife of Wilson's then-manager Jerry Schilling. The album briefly charted, and its second single, \"Heaven\", reached the top 20 on Billboard's Adult Contemporary chart."
    }
]

# Generate answer
input_text = tokenizer.apply_chat_template(conversation=input_conversation, documents=documents, tokenize=False)

inputs = tokenizer(input_text, return_tensors="pt")
output = model_hallucination.generate(inputs["input_ids"].to(device), attention_mask=inputs["attention_mask"].to(device), max_new_tokens=500)
output_text = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print("Output: " + json.loads(output_text))

Training Details

The process of generating the training data consisted of two main steps:

Multi-turn RAG conversation generation: Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpus. For details on the RAG conversation generation process please refer to the Granite Technical Report and Lee, Young-Suk, et al..
Faithfulness label generation: For creating the faithfulness labels for responses, we used the NLI based technique available at Achintalwar, et al..

This process resulted in ~130K data instances, which were used to train the LoRA adapter.

Training Data

The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process:

CoQA - Wikipedia passages
MultiDoc2Dial
QuAC

Training Hyperparameters

The LoRA adapter was fine-tuned using PEFT under the following regime: rank = 8, learning rate = 1e-5, and 90/10 split between training and validation.

Evaluation

We evaluate the LoRA adapter on the QA portion of the RAGTruth benchmark. We compare the response-level hallucination detection performance between the LoRA adapter and the methods reported in the RAGTruth paper. The responses that obtain a faithfulness score less than 0.1 for at least one sentence are considered as hallucinated responses.

The results are shown in the table below. The results for the baselines are extracted from the RAGTruth paper.

Model	Precision	Recall	F1
gpt-3.5-turbo (prompted)	18.8	84.4	30.8
gpt-4-turbo (prompted)	33.2	90.6	45.6
SelfCheckGPT	35	58	43.7
LMvLM	18.7	76.9	30.1
Finetuned Llama-2-13B	61.6	76.3	68.2
hallucination-detection LoRA	67.6	77.4	72.2

Model Card Authors

Chulaka Gunasekara

ibm-granite
/

granite-3.2-8b-lora-rag-hallucination-detection