Amit Kumar
commited on
Commit
·
c44f168
1
Parent(s):
0f6aea4
added accuracy and corrected the lists
Browse files- about/description.md +8 -8
about/description.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
<h2 style="color: #00ff00;">Goal:</h2>
|
2 |
-
The goal of Classification Medical
|
3 |
|
4 |
| S.No | Dataset Name | About the Dataset | Type of Classification | Link to the Dataset |
|
5 |
|------|---------|---|-------------------------|---------|
|
@@ -9,14 +9,14 @@ The goal of Classification Medical LLM Leaderboard is to track, rank and evaluat
|
|
9 |
The leaderboard offers a comprehensive assessment of each model's classification aspects.
|
10 |
|
11 |
<h2 style="color: #00ff00;">Evaluation Criteria:</h2>
|
12 |
-
|
|
|
|
|
13 |
|
14 |
-
<h2 style="color: #00ff00;">Different Parameters:</h2>
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
2. <b> Different roles </b> : Some models like llama allows to assign different content to different roles like system or user in this case.
|
19 |
-
3. <b> Chain of thought prompting: </b> A kind of prompting technique where, a complex task is broken down into simple steps. It is proven to be better than simple prompting. Refer [COT prompting](https://www.promptingguide.ai/techniques/cot)
|
20 |
4. <b> Active Prompt: </b> In progress, it will be used along with different shots prompting
|
21 |
|
22 |
<h2 style="color: #00ff00;">Submit your model or dataset:</h2>
|
|
|
1 |
<h2 style="color: #00ff00;">Goal:</h2>
|
2 |
+
The goal of Classification Medical NLP Leaderboard is to track, rank and evaluate the performance of large language models (LLMs) on medical classification tasks. It evaluates LLMs across a diverse array of medical datasets, starting with radiological reports dataset as follows:
|
3 |
|
4 |
| S.No | Dataset Name | About the Dataset | Type of Classification | Link to the Dataset |
|
5 |
|------|---------|---|-------------------------|---------|
|
|
|
9 |
The leaderboard offers a comprehensive assessment of each model's classification aspects.
|
10 |
|
11 |
<h2 style="color: #00ff00;">Evaluation Criteria:</h2>
|
12 |
+
The primary metric used for evaluation is accuracy, which measures the proportion of correct predictions made by the model. We used two levels of accruacy <br>
|
13 |
+
1. <b> Label-level accuracy <b>: Accuracy is measured in terms of total labels.<br>
|
14 |
+
2. <b> Record-level accuracy <b>: Accuracy is measured if a report is classified accurately across all labels.
|
15 |
|
16 |
+
<h2 style="color: #00ff00;">Different Parameters:</h2> The leaderboard displays the different type of settings explored to get various results <br>
|
17 |
+
1. <b> Different shots prompting </b>: 0 shot, 1 shot, 5 shots. <br>
|
18 |
+
2. <b> Different roles </b> : Some models like llama allows to assign different content to different roles like system or user in this case. <br>
|
19 |
+
3. <b> Chain of thought prompting: </b> A kind of prompting technique where, a complex task is broken down into simple steps. It is proven to be better than simple prompting. Refer [COT prompting](https://www.promptingguide.ai/techniques/cot) <br>
|
|
|
|
|
20 |
4. <b> Active Prompt: </b> In progress, it will be used along with different shots prompting
|
21 |
|
22 |
<h2 style="color: #00ff00;">Submit your model or dataset:</h2>
|