Amit Kumar commited on
Commit
c44f168
·
1 Parent(s): 0f6aea4

added accuracy and corrected the lists

Browse files
Files changed (1) hide show
  1. about/description.md +8 -8
about/description.md CHANGED
@@ -1,5 +1,5 @@
1
  <h2 style="color: #00ff00;">Goal:</h2>
2
- The goal of Classification Medical LLM Leaderboard is to track, rank and evaluate the performance of large language models (LLMs) on medical classification tasks. It evaluates LLMs across a diverse array of medical datasets, starting with radiological reports dataset as follows:
3
 
4
  | S.No | Dataset Name | About the Dataset | Type of Classification | Link to the Dataset |
5
  |------|---------|---|-------------------------|---------|
@@ -9,14 +9,14 @@ The goal of Classification Medical LLM Leaderboard is to track, rank and evaluat
9
  The leaderboard offers a comprehensive assessment of each model's classification aspects.
10
 
11
  <h2 style="color: #00ff00;">Evaluation Criteria:</h2>
12
- 1. Accuracy: The primary metric used for evaluation is accuracy, which measures the proportion of correct predictions made by the model.
 
 
13
 
14
- <h2 style="color: #00ff00;">Different Parameters:</h2>
15
-
16
- The leaderboard displays the different type of settings explored to get various results
17
- 1. <b> Different shots prompting </b>: 0 shot, 1 shot, 5 shots.
18
- 2. <b> Different roles </b> : Some models like llama allows to assign different content to different roles like system or user in this case.
19
- 3. <b> Chain of thought prompting: </b> A kind of prompting technique where, a complex task is broken down into simple steps. It is proven to be better than simple prompting. Refer [COT prompting](https://www.promptingguide.ai/techniques/cot)
20
  4. <b> Active Prompt: </b> In progress, it will be used along with different shots prompting
21
 
22
  <h2 style="color: #00ff00;">Submit your model or dataset:</h2>
 
1
  <h2 style="color: #00ff00;">Goal:</h2>
2
+ The goal of Classification Medical NLP Leaderboard is to track, rank and evaluate the performance of large language models (LLMs) on medical classification tasks. It evaluates LLMs across a diverse array of medical datasets, starting with radiological reports dataset as follows:
3
 
4
  | S.No | Dataset Name | About the Dataset | Type of Classification | Link to the Dataset |
5
  |------|---------|---|-------------------------|---------|
 
9
  The leaderboard offers a comprehensive assessment of each model's classification aspects.
10
 
11
  <h2 style="color: #00ff00;">Evaluation Criteria:</h2>
12
+ The primary metric used for evaluation is accuracy, which measures the proportion of correct predictions made by the model. We used two levels of accruacy <br>
13
+ 1. <b> Label-level accuracy <b>: Accuracy is measured in terms of total labels.<br>
14
+ 2. <b> Record-level accuracy <b>: Accuracy is measured if a report is classified accurately across all labels.
15
 
16
+ <h2 style="color: #00ff00;">Different Parameters:</h2> The leaderboard displays the different type of settings explored to get various results <br>
17
+ 1. <b> Different shots prompting </b>: 0 shot, 1 shot, 5 shots. <br>
18
+ 2. <b> Different roles </b> : Some models like llama allows to assign different content to different roles like system or user in this case. <br>
19
+ 3. <b> Chain of thought prompting: </b> A kind of prompting technique where, a complex task is broken down into simple steps. It is proven to be better than simple prompting. Refer [COT prompting](https://www.promptingguide.ai/techniques/cot) <br>
 
 
20
  4. <b> Active Prompt: </b> In progress, it will be used along with different shots prompting
21
 
22
  <h2 style="color: #00ff00;">Submit your model or dataset:</h2>