Update README.md
Browse files
README.md
CHANGED
@@ -24,10 +24,25 @@ metrics:
|
|
24 |
|
25 |
# TD-HallOumi-3B: Llama 3.2 3B for Hallucination Detection / Claim Verification
|
26 |
|
|
|
27 |
This model is a fine-tuned version of `meta-llama/Llama-3.2-3B-Instruct` specifically adapted for **Claim Verification** and **Hallucination Detection**. It assesses whether claims made in a response are supported by a given context document.
|
28 |
|
29 |
This work is inspired by and utilizes datasets developed for the [HallOumi project by Oumi AI](https://oumi.ai/blog/posts/introducing-halloumi), which aims to build trust in AI systems by enabling verifiable outputs. This 3B parameter model is provided by the **TEEN-DIFFERENT** community.
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
## Model Details
|
32 |
|
33 |
* **Base Model:** [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
|
|
|
24 |
|
25 |
# TD-HallOumi-3B: Llama 3.2 3B for Hallucination Detection / Claim Verification
|
26 |
|
27 |
+
|
28 |
This model is a fine-tuned version of `meta-llama/Llama-3.2-3B-Instruct` specifically adapted for **Claim Verification** and **Hallucination Detection**. It assesses whether claims made in a response are supported by a given context document.
|
29 |
|
30 |
This work is inspired by and utilizes datasets developed for the [HallOumi project by Oumi AI](https://oumi.ai/blog/posts/introducing-halloumi), which aims to build trust in AI systems by enabling verifiable outputs. This 3B parameter model is provided by the **TEEN-DIFFERENT** community.
|
31 |
|
32 |
+
## Performance
|
33 |
+
|
34 |
+
Evaluated on the [oumi-ai/oumi-groundedness-benchmark](https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark) for Hallucination Detection (Macro F1 Score):
|
35 |
+
|
36 |
+
|
37 |
+

|
38 |
+
|
39 |
+
* **TD-HallOumi-3B\*** achieves **68.00%** Macro F1.
|
40 |
+
* **Highly Efficient:** This 3B parameter model outperforms larger models like Open AI o1, Llama 3.1 405B and Gemini 1.5 Pro.
|
41 |
+
* **Competitive:** Ranks closely behind Claude Sonnet 3.5 (69.60%).
|
42 |
+
|
43 |
+
This model offers strong hallucination detection capabilities with significantly fewer parameters than many alternatives.
|
44 |
+
|
45 |
+
|
46 |
## Model Details
|
47 |
|
48 |
* **Base Model:** [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
|