empirischtech
/

Llama-3.1-10B-Instruct

Text Generation

Model card Files Files and versions

rwmasood commited on Feb 17

Commit

019c2df

·

verified ·

1 Parent(s): feede7d

Update README.md

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -72,7 +72,16 @@ The following two different evaluations are performed.
 Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
 ```python
 from evaluate import load
 import datasets
@@ -96,13 +105,6 @@ print(round(results["mean_perplexity"], 2))
-#### Main Results
-| Model |  Perplexity Score |
-|---------------------------------------------|----------|
-| **Llama-3.1-8B-Instruct** | 842611366.59 |
-| **Llama-3.1-10B-Instruct** | 2890.31 |
 ### Harness Evaluation
@@ -117,7 +119,7 @@ The library used is [lm-evaluation-harness repository](https://github.com/Eleuth
 | **Llama-3.1-8B-Instruct** | **73** | **71.1** | **87.9** |
-### Scripts to generate evalution results
 ```python
 # install from https://github.com/EleutherAI/lm-evaluation-harness
@@ -130,8 +132,6 @@ tasks_list = ["arc_challenge", "gpqa", "ifeval", "mmlu_pro", "hellaswag"]  # Ben
 model_path='rwmasood/llama-3.1-10b-instruct'
 model_name_or_path = "./output/checkpoint-2800"
-```
 # Run evaluation
 results = evaluator.simple_evaluate(
     model="hf",  # Hugging Face model

 Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A **lower perplexity** score indicates better performance (i.e., the model is more confident in its predictions).
+#### Main Results
+| Model |  Perplexity Score |
+|---------------------------------------------|----------|
+| **Llama-3.1-8B-Instruct** | 842611366.59 |
+| **Llama-3.1-10B-Instruct** | 2890.31 |
+#### Scripts to generate evalution results
 ```python
 from evaluate import load
 import datasets
 ### Harness Evaluation
 | **Llama-3.1-8B-Instruct** | **73** | **71.1** | **87.9** |
+#### Scripts to generate evalution results
 ```python
 # install from https://github.com/EleutherAI/lm-evaluation-harness
 model_path='rwmasood/llama-3.1-10b-instruct'
 model_name_or_path = "./output/checkpoint-2800"
 # Run evaluation
 results = evaluator.simple_evaluate(
     model="hf",  # Hugging Face model