oumi-ai
/

HallOumi-8B-classifier

@@ -22,17 +22,17 @@ base_model:
 <!-- Provide a quick summary of what the model is/does. -->
-Introducing **HallOumi-8B-classifier**, a _fast_ **SOTA hallucination detection model**, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Anthropic Sonnet 3.5 at only 8 billion parameters!
-<!-- Give HallOumi a try now! -->
-<!-- * Demo: https://oumi.ai/halloumi-demo -->
-<!-- * Github: https://github.com/oumi-ai/oumi/tree/main/configs/projects/halloumi -->
 | Model                 | Macro F1 Score | Open?             | Model Size |
 | --------------------- | -------------- | ----------------- | ---------- |
 | **HallOumi-8B**           | **77.2% ± 2.2%**   | Truly Open Source | 8B         |
-| Anthropic Sonnet 3.5  | 69.6% ± 2.8%   | Closed            | ??         |
 | OpenAI o1-preview     | 65.9% ± 2.3%   | Closed            | ??         |
 | DeepSeek R1           | 61.6% ± 2.5%   | Open Weights      | 671B       |
 | Llama 3.1 405B        | 58.8% ± 2.4%   | Open Weights      | 405B       |
@@ -55,10 +55,10 @@ While such tools are useful in the right hands, being unable to trust them preve
 where it can be utilized safely and responsibly.
 ## Building Trust with Verifiability
-To begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:
 * Understand the **truthfulness** of a particular statement produced by any model (the key focus of **HallOumi-8B-classifier** model).
-* Understand what **information supports that statement’s truth** and have **full traceability** connecting the statement to that information. (Novel aspects treated by our *generative* [HallOumi model](https://huggingface.co/oumi-ai/HallOumi-8B))
 - **Developed by:** [Oumi AI](https://oumi.ai/)
@@ -66,7 +66,7 @@ To begin trusting AI systems, we have to be able to verify their outputs. To ver
 - **Language(s) (NLP):** English
 - **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en)
 - **Finetuned from model:** [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
-<!-- - **Demo:** [HallOumi Demo](https://oumi.ai/halloumi) -->
 ---
@@ -75,12 +75,12 @@ To begin trusting AI systems, we have to be able to verify their outputs. To ver
 <!-- Address questions around how the model is intended to be used, including the foreseeable users and those affected by the model. -->
 Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.
-<!-- Demo: https://oumi.ai/halloumi -->
 ## Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-Smaller LLMs have limited capabilities and should be used with caution. Please don't use this model for purposes outside of claim verification.
 ## Bias, Risks, and Limitations
@@ -100,12 +100,13 @@ Training data:
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when relevant to the training procedure. -->
-Training notebook: Coming Soon
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
-Eval notebook: Coming Soon
 ## Environmental Impact

 <!-- Provide a quick summary of what the model is/does. -->
+Introducing **HallOumi-8B-classifier**, a _fast_ **SOTA hallucination detection model**, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Claude Sonnet 3.5 at only **8 billion parameters!**
+Give HallOumi a try now!
+* Demo: https://oumi.ai/halloumi-demo
+* Github: https://github.com/oumi-ai/oumi/tree/main/configs/projects/halloumi
 | Model                 | Macro F1 Score | Open?             | Model Size |
 | --------------------- | -------------- | ----------------- | ---------- |
 | **HallOumi-8B**           | **77.2% ± 2.2%**   | Truly Open Source | 8B         |
+| Claude Sonnet 3.5     | 69.6% ± 2.8%   | Closed            | ??         |
 | OpenAI o1-preview     | 65.9% ± 2.3%   | Closed            | ??         |
 | DeepSeek R1           | 61.6% ± 2.5%   | Open Weights      | 671B       |
 | Llama 3.1 405B        | 58.8% ± 2.4%   | Open Weights      | 405B       |
 where it can be utilized safely and responsibly.
 ## Building Trust with Verifiability
+To be able to begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:
 * Understand the **truthfulness** of a particular statement produced by any model (the key focus of **HallOumi-8B-classifier** model).
+* Understand what **information supports that statement’s truth** and have **full traceability** connecting the statement to that information (provided by our *generative* [HallOumi model](https://huggingface.co/oumi-ai/HallOumi-8B))
 - **Developed by:** [Oumi AI](https://oumi.ai/)
 - **Language(s) (NLP):** English
 - **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en)
 - **Finetuned from model:** [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
+- **Demo:** [HallOumi Demo](https://oumi.ai/halloumi)
 ---
 <!-- Address questions around how the model is intended to be used, including the foreseeable users and those affected by the model. -->
 Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.
+Demo: https://oumi.ai/halloumi-demo
 ## Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+Smaller LLMs have limited capabilities and should be used with caution. Avoid using this model for purposes outside of claim verification.
 ## Bias, Risks, and Limitations
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when relevant to the training procedure. -->
+For information on training, see https://oumi.ai/halloumi
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
+Follow along with our notebook on how to evaluate hallucination with HallOumi and other popular models:
+https://github.com/oumi-ai/oumi/blob/main/configs/projects/halloumi/halloumi_eval_notebook.ipynb
 ## Environmental Impact