Update README.md
Browse files
README.md
CHANGED
@@ -22,17 +22,17 @@ base_model:
|
|
22 |
|
23 |
<!-- Provide a quick summary of what the model is/does. -->
|
24 |
|
25 |
-
Introducing **HallOumi-8B-classifier**, a _fast_ **SOTA hallucination detection model**, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
-
|
31 |
|
32 |
| Model | Macro F1 Score | Open? | Model Size |
|
33 |
| --------------------- | -------------- | ----------------- | ---------- |
|
34 |
| **HallOumi-8B** | **77.2% ± 2.2%** | Truly Open Source | 8B |
|
35 |
-
|
|
36 |
| OpenAI o1-preview | 65.9% ± 2.3% | Closed | ?? |
|
37 |
| DeepSeek R1 | 61.6% ± 2.5% | Open Weights | 671B |
|
38 |
| Llama 3.1 405B | 58.8% ± 2.4% | Open Weights | 405B |
|
@@ -55,10 +55,10 @@ While such tools are useful in the right hands, being unable to trust them preve
|
|
55 |
where it can be utilized safely and responsibly.
|
56 |
|
57 |
## Building Trust with Verifiability
|
58 |
-
To begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:
|
59 |
|
60 |
* Understand the **truthfulness** of a particular statement produced by any model (the key focus of **HallOumi-8B-classifier** model).
|
61 |
-
* Understand what **information supports that statement’s truth** and have **full traceability** connecting the statement to that information
|
62 |
|
63 |
|
64 |
- **Developed by:** [Oumi AI](https://oumi.ai/)
|
@@ -66,7 +66,7 @@ To begin trusting AI systems, we have to be able to verify their outputs. To ver
|
|
66 |
- **Language(s) (NLP):** English
|
67 |
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en)
|
68 |
- **Finetuned from model:** [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
69 |
-
|
70 |
|
71 |
---
|
72 |
|
@@ -75,12 +75,12 @@ To begin trusting AI systems, we have to be able to verify their outputs. To ver
|
|
75 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users and those affected by the model. -->
|
76 |
Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.
|
77 |
|
78 |
-
|
79 |
|
80 |
## Out-of-Scope Use
|
81 |
|
82 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
83 |
-
Smaller LLMs have limited capabilities and should be used with caution.
|
84 |
|
85 |
## Bias, Risks, and Limitations
|
86 |
|
@@ -100,12 +100,13 @@ Training data:
|
|
100 |
### Training Procedure
|
101 |
|
102 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when relevant to the training procedure. -->
|
103 |
-
|
104 |
|
105 |
## Evaluation
|
106 |
|
107 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
108 |
-
|
|
|
109 |
|
110 |
## Environmental Impact
|
111 |
|
|
|
22 |
|
23 |
<!-- Provide a quick summary of what the model is/does. -->
|
24 |
|
25 |
+
Introducing **HallOumi-8B-classifier**, a _fast_ **SOTA hallucination detection model**, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Claude Sonnet 3.5 at only **8 billion parameters!**
|
26 |
|
27 |
+
Give HallOumi a try now!
|
28 |
|
29 |
+
* Demo: https://oumi.ai/halloumi-demo
|
30 |
+
* Github: https://github.com/oumi-ai/oumi/tree/main/configs/projects/halloumi
|
31 |
|
32 |
| Model | Macro F1 Score | Open? | Model Size |
|
33 |
| --------------------- | -------------- | ----------------- | ---------- |
|
34 |
| **HallOumi-8B** | **77.2% ± 2.2%** | Truly Open Source | 8B |
|
35 |
+
| Claude Sonnet 3.5 | 69.6% ± 2.8% | Closed | ?? |
|
36 |
| OpenAI o1-preview | 65.9% ± 2.3% | Closed | ?? |
|
37 |
| DeepSeek R1 | 61.6% ± 2.5% | Open Weights | 671B |
|
38 |
| Llama 3.1 405B | 58.8% ± 2.4% | Open Weights | 405B |
|
|
|
55 |
where it can be utilized safely and responsibly.
|
56 |
|
57 |
## Building Trust with Verifiability
|
58 |
+
To be able to begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:
|
59 |
|
60 |
* Understand the **truthfulness** of a particular statement produced by any model (the key focus of **HallOumi-8B-classifier** model).
|
61 |
+
* Understand what **information supports that statement’s truth** and have **full traceability** connecting the statement to that information (provided by our *generative* [HallOumi model](https://huggingface.co/oumi-ai/HallOumi-8B))
|
62 |
|
63 |
|
64 |
- **Developed by:** [Oumi AI](https://oumi.ai/)
|
|
|
66 |
- **Language(s) (NLP):** English
|
67 |
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en)
|
68 |
- **Finetuned from model:** [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
69 |
+
- **Demo:** [HallOumi Demo](https://oumi.ai/halloumi)
|
70 |
|
71 |
---
|
72 |
|
|
|
75 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users and those affected by the model. -->
|
76 |
Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.
|
77 |
|
78 |
+
Demo: https://oumi.ai/halloumi-demo
|
79 |
|
80 |
## Out-of-Scope Use
|
81 |
|
82 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
83 |
+
Smaller LLMs have limited capabilities and should be used with caution. Avoid using this model for purposes outside of claim verification.
|
84 |
|
85 |
## Bias, Risks, and Limitations
|
86 |
|
|
|
100 |
### Training Procedure
|
101 |
|
102 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when relevant to the training procedure. -->
|
103 |
+
For information on training, see https://oumi.ai/halloumi
|
104 |
|
105 |
## Evaluation
|
106 |
|
107 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
108 |
+
Follow along with our notebook on how to evaluate hallucination with HallOumi and other popular models:
|
109 |
+
https://github.com/oumi-ai/oumi/blob/main/configs/projects/halloumi/halloumi_eval_notebook.ipynb
|
110 |
|
111 |
## Environmental Impact
|
112 |
|