Model Card for HiTZ/Llama-3.1-8B-Instruct-multi-truth-judge
This model card is for a judge model fine-tuned to evaluate truthfulness, based on the work "Truth Knows No Language: Evaluating Truthfulness Beyond English".
Model Details
Model Description
This model is an LLM-as-a-Judge, fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct
to assess the truthfulness of text generated by other language models. The evaluation framework and findings are detailed in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English." The primary goal of this work is to extend truthfulness evaluations beyond English, covering English, Basque, Catalan, Galician, and Spanish. This specific judge model evaluates truthfulness across multiple languages.
- Developed by: Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria De Dios Flores, Rodrigo Agerri.
- Affiliations: HiTZ Center - Ixa, University of the Basque Country, UPV/EHU; Elhuyar; Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela; Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra.
- Funded by: MCIN/AEI/10.13039/501100011033 projects: DeepKnowledge (PID2021-127777OB-C21) and by FEDER, EU; Disargue (TED2021-130810B-C21) and European Union NextGenerationEU/PRTR; DeepMinor (CNS2023-144375) and European Union NextGenerationEU/PRTR; NÓS-ILENIA (2022/TL22/0021533). Xunta de Galicia: Centro de investigación de Galicia accreditation 2024-2027 ED431G-2023/04. UPV/EHU PIF22/84 predoc grant (Blanca Calvo Figueras). Basque Government PhD grant PRE_2024_2_0028 (Julen Etxaniz). Juan de la Cierva contract and project JDC2022-049433-I (Iria de Dios Flores), financed by the MCIN/AEI/10.13039/501100011033 and the European Union “NextGenerationEU”/PRTR.
- Shared by: HiTZ Center
- Model type: LLM-as-a-Judge, based on
Llama-3.1
- Language(s) (NLP): Fine-tuned to judge outputs in multiple languages (English, Basque, Catalan, Galician, Spanish). The underlying TruthfulQA-Multi benchmark, used for context, covers English, Basque, Catalan, Galician, and Spanish.
- License: The base model
meta-llama/Meta-Llama-3.1-8B-Instruct
is governed by the Llama 3.1 license. The fine-tuning code, this model's weights, and the TruthfulQA-Multi dataset are publicly available under Apache 2.0. - Finetuned from model:
meta-llama/Meta-Llama-3.1-8B-Instruct
Model Sources
- Repository (for the project and fine-tuning code):
https://github.com/hitz-zentroa/truthfulqa-multi
- Paper: "Truth Knows No Language: Evaluating Truthfulness Beyond English" (
https://arxiv.org/abs/2502.09387
) - Dataset (TruthfulQA-Multi):
https://huggingface.co/datasets/HiTZ/truthful_judge
Uses
Direct Use
This model is intended for direct use as an LLM-as-a-Judge. It takes a question, a reference answer, and a model-generated answer as input, and outputs a judgment on the truthfulness of the model-generated answer. This is particularly relevant for evaluating models on the TruthfulQA benchmark, specifically for multiple languages (English, Basque, Catalan, Galician, Spanish).
Downstream Use
This judge model could potentially be used as a component in larger systems for content moderation, automated fact-checking research, or as a basis for further fine-tuning on more specific truthfulness-related tasks or domains.
Out-of-Scope Use
This model is not designed for:
- Generating general-purpose creative text or dialogue.
- Providing factual information directly (it judges, it doesn't assert).
- Use in safety-critical applications without thorough validation.
- Any application intended to deceive or spread misinformation. The model's judgments are based on its training and may not be infallible.
Bias, Risks, and Limitations
The model's performance and biases are influenced by its base model (meta-llama/Meta-Llama-3.1-8B-Instruct
) and the TruthfulQA dataset. Key points from "Truth Knows No Language: Evaluating Truthfulness Beyond English":
- Language Discrepancies: LLMs (and thus judges based on them) may perform differently across languages.
- Anglocentricity: The original TruthfulQA benchmark has English-centric knowledge and cultural contexts. This model judges multiple non-English languages.
- Context and Time Dependency: Universal knowledge questions are handled better across languages than context- and time-dependent ones.
- Truthfulness: This is a "truth-judge". Truthfulness is a critical component of overall answer quality.
- Imitative Falsehoods: TruthfulQA focuses on "imitative falsehoods" – statements that are false but sound plausible or mimic common misconceptions.
Recommendations
- Users should be aware of the potential for varying performance across different types of questions (e.g., universal vs. culturally specific) and languages.
- For critical applications, judgments from this model should be cross-verified, ideally with human oversight.
- Consider the limitations regarding cultural and temporal variability when interpreting results.
How to Get Started with the Model
This model can be loaded using the Hugging Face transformers
library.
# Example (conceptual, actual usage depends on task setup)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "HiTZ/Meta-Llama-3.1-8B-Instruct-multi-truth-judge"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# For LLM-as-a-Judge, you would typically format input like:
# prompt = f"Question: {question}\nReference Answer: {ref_answer}\nModel Answer: {model_answer}\nIs the model answer truthful? Judge:"
# inputs = tokenizer(prompt, return_tensors="pt")
# outputs = model.generate(**inputs) # Adjust generation parameters as needed
# judgment = tokenizer.decode(outputs[0], skip_special_tokens=True)
Refer to the project repository (https://github.com/hitz-zentroa/truthfulqa-multi
) for specific examples of how judge models were used in the evaluation.
Training Details
Training Data
The model was fine-tuned on a dataset derived from the TruthfulQA-Multi benchmark \cite{calvo-etal-2025-truthknowsnolanguage}.
- Dataset Link:
https://huggingface.co/datasets/HiTZ/truthful_judge
- Training Data Specifics: Trained on data for multiple languages (English, Basque, Catalan, Galician, Spanish) for truth judging. This corresponds to the "MT data (all languages except English)" mentioned in the paper for Truth-Judges.
Training Procedure
The model was fine-tuned as an LLM-as-a-Judge. The methodology was adapted from the original TruthfulQA paper \cite{lin-etal-2022-truthfulqa}, where the model learns to predict whether an answer is truthful given a question and reference answers.
Preprocessing
Inputs were formatted to present the judge model with a question, correct answer(s), and the answer to be judged, prompting it to assess truthfulness.
Training Hyperparameters
- Training regime:
bfloat16
mixed precision - Base model:
meta-llama/Meta-Llama-3.1-8B-Instruct
- Epochs: 5
- Learning rate: 0.01
- Batch size: Refer to project code
- Optimizer: Refer to project code
- Transformers Version:
4.44.2
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model's evaluation methodology is described in "Truth Knows No Language: Evaluating Truthfulness Beyond English," using questions from the TruthfulQA-Multi dataset (English, Basque, Catalan, Galician, Spanish portions).
Factors
- Language: Multiple languages (English, Basque, Catalan, Galician, Spanish).
- Model Type (of models being judged): Base and instruction-tuned LLMs.
- Evaluation Metric: Correlation of LLM-as-a-Judge scores with human judgments on truthfulness.
Metrics
- Primary Metric: Spearman correlation between the judge model's scores and human-annotated scores for truthfulness.
- The paper (Table 4) reports performance for Truth-Judge models. For the Llama-3.1-8B-Instruct base model trained on MT data (all languages except English), the Kappa scores were: Basque (0.51), Catalan (0.54), Galician (0.49), Spanish (0.57).
Results
Summary
As reported in "Truth Knows No Language: Evaluating Truthfulness Beyond English" (specifically Table 4 for Truth-Judges):
- This specific model (
multi_llama3.1_instruct_truth_judge
) is the Truth-Judge fine-tuned onmeta-llama/Meta-Llama-3.1-8B-Instruct
using combined multilingual data (English, Basque, Catalan, Galician, Spanish). - Performance varies by language, with Kappa scores detailed in Table 4 of the paper.
Technical Specifications
Model Architecture and Objective
The model is based on the Llama-3.1
architecture (LlamaForCausalLM
). It is a Causal Language Model fine-tuned with the objective of acting as a "judge" to predict the truthfulness of answers to questions.
- Hidden Size:
4096
- Intermediate Size:
14336
- Num Attention Heads:
32
- Num Hidden Layers:
32
- Num Key Value Heads:
8
- Vocab Size:
128256
Compute Infrastructure
- Hardware: Refer to project for details.
- Software: PyTorch, Transformers
4.44.2
Citation
Paper:
@inproceedings{calvo-etal-2025-truthknowsnolanguage,
title = "Truth Knows No Language: Evaluating Truthfulness Beyond English",
author = "Calvo Figueras, Blanca and Sagarzazu, Eneko and Etxaniz, Julen and Barnes, Jeremy and Gamallo, Pablo and De Dios Flores, Iria and Agerri, Rodrigo",
year={2025},
eprint={2502.09387},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.09387}
}
More Information
For more details on the methodology, dataset, and findings, please refer to the full paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" and the project repository: https://github.com/hitz-zentroa/truthfulqa-multi
.
Model Card Authors
This model card was generated based on information from the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English" by Blanca Calvo Figueras et al., and adapted from the Hugging Face model card template. Content populated by GitHub Copilot.
Model Card Contact
For questions about the model or the research, please contact:
- Blanca Calvo Figueras:
[email protected]
- Rodrigo Agerri:
[email protected]
- Downloads last month
- 1
Model tree for HiTZ/Llama-3.1-8B-Instruct-multi-truth-judge
Base model
meta-llama/Llama-3.1-8B