zhiyucheng omrialmog commited on
Commit
a0a0bc4
·
verified ·
1 Parent(s): 539ac9a

Update README.md (#1)

Browse files

- Update README.md (921f2498410ea24731d7d9968d000aaf519ed668)
- Update README.md (9090e1488c31145268d822bb8166f9f3c80ffb17)
- Update README.md (d93f80d03ef27e1af39b94cac166d281bcc7f0a4)


Co-authored-by: Omri Almog <[email protected]>

Files changed (1) hide show
  1. README.md +25 -18
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-405B-Instruct
 
 
 
4
  ---
5
  # Model Overview
6
 
@@ -77,39 +80,37 @@ python examples/llama/convert_checkpoint.py --model_dir Llama-3.1-405B-Instruct-
77
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
78
  ```
79
 
80
- * Accuracy evaluation:
81
-
82
- 1) Prepare the MMLU dataset:
83
- ```sh
84
- mkdir data; wget https://people.eecs.berkeley.edu/~hendrycks/data.tar -O data/mmlu.tar
85
- tar -xf data/mmlu.tar -C data && mv data/data data/mmlu
86
- ```
87
-
88
- 2) Measure MMLU:
89
-
90
- ```sh
91
- python examples/mmlu.py --engine_dir ./engine --tokenizer_dir Llama-3.1-405B-Instruct-FP8/ --test_trt_llm --data_dir data/mmlu
92
- ```
93
-
94
  * Throughputs evaluation:
95
 
96
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
97
 
98
  #### Evaluation
99
- The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark results are presented in the table below:
100
  <table>
101
  <tr>
102
  <td><strong>Precision</strong>
103
  </td>
104
  <td><strong>MMLU</strong>
105
  </td>
 
 
 
 
 
 
106
  <td><strong>TPS</strong>
107
  </td>
108
  </tr>
109
  <tr>
110
- <td>FP16
 
 
 
 
111
  </td>
112
- <td>86.6
 
 
113
  </td>
114
  <td>275.0
115
  </td>
@@ -117,7 +118,13 @@ The accuracy (MMLU, 5-shot) and throughputs (tokens per second, TPS) benchmark r
117
  <tr>
118
  <td>FP8
119
  </td>
120
- <td>86.2
 
 
 
 
 
 
121
  </td>
122
  <td>469.78
123
  </td>
 
1
  ---
2
  base_model:
3
  - meta-llama/Llama-3.1-405B-Instruct
4
+ license: llama3.1
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  ---
8
  # Model Overview
9
 
 
80
  trtllm-build --checkpoint_dir /ckpt --output_dir /engine
81
  ```
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  * Throughputs evaluation:
84
 
85
  Please refer to the [TensorRT-LLM benchmarking documentation](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/Suite.md) for details.
86
 
87
  #### Evaluation
88
+
89
  <table>
90
  <tr>
91
  <td><strong>Precision</strong>
92
  </td>
93
  <td><strong>MMLU</strong>
94
  </td>
95
+ <td><strong>GSM8K (CoT) </strong>
96
+ </td>
97
+ <td><strong>ARC Challenge</strong>
98
+ </td>
99
+ <td><strong>IFEVAL</strong>
100
+ </td>
101
  <td><strong>TPS</strong>
102
  </td>
103
  </tr>
104
  <tr>
105
+ <td>BF16
106
+ </td>
107
+ <td>87.3
108
+ </td>
109
+ <td>96.8
110
  </td>
111
+ <td>96.9
112
+ </td>
113
+ <td>88.6
114
  </td>
115
  <td>275.0
116
  </td>
 
118
  <tr>
119
  <td>FP8
120
  </td>
121
+ <td>87.4
122
+ </td>
123
+ <td>96.2
124
+ </td>
125
+ <td>96.4
126
+ </td>
127
+ <td>90.4
128
  </td>
129
  <td>469.78
130
  </td>