infly
/

INF-34B-Chat

Text Generation

Model card Files Files and versions

stgzr commited on Jul 21, 2024

Commit

f0c363f

·

verified ·

1 Parent(s): a78918e

update README.md

Files changed (1) hide show

README.md +1 -4

README.md CHANGED Viewed

@@ -71,7 +71,7 @@ We evaluate our model on several academic benchmarks then compare with other sim
 | HellaSwag(0-shot) |     82.03      |     81.57     |    83.32    |
-**Note:** To facilitate reproduction, the results of common benchmarks are generated by [OpenCompass](https://github.com/open-compass/opencompass) except humaneval and mbpp as we experience code timeout and postprocess issues. Besides, Usmle and CFA is evaluated using internal evaluation scripts.
 ### Chat Model
@@ -85,9 +85,6 @@ We present the performance results of our chat model and other LLM on various st
 | Arena-Hard |     24.2      |     42.6     |    43.1    |
 | GSM8K |     81.42      |     79.45     |    84.04    |
 | MATH |     42.28      |     54.06     |    51.48   |
-| USMLE |     58.70     |     55.84     |    79.70    |
-| CFA 2.0 |     35.5      |     42.5     |    62.75    |
 ### Long Context

 | HellaSwag(0-shot) |     82.03      |     81.57     |    83.32    |
+**Note:** To facilitate reproduction, the results of common benchmarks are generated by [OpenCompass](https://github.com/open-compass/opencompass) except humaneval and mbpp as we experience code timeout and postprocess issues.
 ### Chat Model
 | Arena-Hard |     24.2      |     42.6     |    43.1    |
 | GSM8K |     81.42      |     79.45     |    84.04    |
 | MATH |     42.28      |     54.06     |    51.48   |
 ### Long Context