cicdatopea
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -58,21 +58,21 @@ print(response)
|
|
58 |
## That sounds like the start of an exciting story. What kind of adventures does she like? Is she into hiking, traveling, trying new foods, or something else? Tell me more about her!
|
59 |
|
60 |
##BF16:
|
61 |
-
##
|
62 |
|
63 |
prompt = "Which one is larger, 9.11 or 9.8"
|
64 |
##INT4:
|
65 |
## 9.11 is larger than 9.8.
|
66 |
|
67 |
##BF16:
|
68 |
-
##
|
69 |
|
70 |
prompt = "How many r in strawberry."
|
71 |
##INT4:
|
72 |
## There are 2 R's in the word "strawberry".
|
73 |
|
74 |
##BF16:
|
75 |
-
##
|
76 |
|
77 |
prompt = "Once upon a time,"
|
78 |
##INT4:
|
@@ -82,7 +82,13 @@ prompt = "Once upon a time,"
|
|
82 |
## As she set out on her quest, Sophia encountered a wise old wizard named Zephyr,
|
83 |
|
84 |
##BF16:
|
85 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
```
|
88 |
### Evaluate the model
|
@@ -94,19 +100,20 @@ pip3 install lm-eval==0.4.5
|
|
94 |
|
95 |
| Metric | BF16 | INT4 |
|
96 |
| --------------------------- | ------------------------ | ------------------------ |
|
97 |
-
| avg |
|
98 |
-
| leaderboard_mmlu_pro 5shot |
|
99 |
-
| leaderboard_ifeval |
|
100 |
-
|
|
101 |
-
|
|
102 |
-
|
|
103 |
-
|
|
104 |
-
|
|
105 |
-
|
|
106 |
-
|
|
107 |
-
|
|
108 |
-
|
|
109 |
-
|
|
|
|
110 |
|
111 |
## Generate the model
|
112 |
|
|
|
58 |
## That sounds like the start of an exciting story. What kind of adventures does she like? Is she into hiking, traveling, trying new foods, or something else? Tell me more about her!
|
59 |
|
60 |
##BF16:
|
61 |
+
## That sounds like the start of an exciting story. The girl who likes adventure, let's call her Alex, is probably always looking for her next thrill. She might enjoy activities like hiking, rock climbing, or exploring new places. Perhaps she's always been drawn to the unknown and loves to challenge herself to try new things.
|
62 |
|
63 |
prompt = "Which one is larger, 9.11 or 9.8"
|
64 |
##INT4:
|
65 |
## 9.11 is larger than 9.8.
|
66 |
|
67 |
##BF16:
|
68 |
+
## 9.11 is larger than 9.8.
|
69 |
|
70 |
prompt = "How many r in strawberry."
|
71 |
##INT4:
|
72 |
## There are 2 R's in the word "strawberry".
|
73 |
|
74 |
##BF16:
|
75 |
+
## There are 2 R's in the word "strawberry".
|
76 |
|
77 |
prompt = "Once upon a time,"
|
78 |
##INT4:
|
|
|
82 |
## As she set out on her quest, Sophia encountered a wise old wizard named Zephyr,
|
83 |
|
84 |
##BF16:
|
85 |
+
## ...in a far-off kingdom, where the sun dipped into the horizon and painted the sky with hues of crimson and gold, there lived a young adventurer named Sophia. She had hair as black as the night and eyes as blue as the clearest summer sky. Sophia was known throughout the land for her bravery, kindness, and insatiable curiosity.
|
86 |
+
## What would you like to happen next in the story? Would you like Sophia to:
|
87 |
+
## A) Embark on a quest to find a legendary treasure
|
88 |
+
## B) Encounter a mysterious stranger with a hidden agenda
|
89 |
+
## C) Discover a magical forest filled with ancient secrets
|
90 |
+
## D) Something entirely different (please specify)
|
91 |
+
## Choose your response to progress the story!
|
92 |
|
93 |
```
|
94 |
### Evaluate the model
|
|
|
100 |
|
101 |
| Metric | BF16 | INT4 |
|
102 |
| --------------------------- | ------------------------ | ------------------------ |
|
103 |
+
| avg | 0.7023 | 0.7033 |
|
104 |
+
| leaderboard_mmlu_pro 5shot | 0.5484 | 0.5328 |
|
105 |
+
| leaderboard_ifeval | 0.6661=(0.7110+0.6211)/2 | 0.7132=(0.7554+0.6710)/2 |
|
106 |
+
| mmlu | 0.8195 | 0.8164 |
|
107 |
+
| lambada_openai | 0.7528 | 0.7599 |
|
108 |
+
| hellaswag | 0.6575 | 0.6540 |
|
109 |
+
| winogrande | 0.7869 | 0.7932 |
|
110 |
+
| piqa | 0.8303 | 0.8254 |
|
111 |
+
| truthfulqa_mc1 | 0.4284 | 0.4272 |
|
112 |
+
| openbookqa | 0.3720 | 0.3540 |
|
113 |
+
| boolq | 0.8865 | 0.8826 |
|
114 |
+
| arc_easy | 0.8624 | 0.8577 |
|
115 |
+
| arc_challenge | 0.6109 | 0.6015 |
|
116 |
+
| gsm8k(5shot) strict match | 0.9083 | 0.9249 |
|
117 |
|
118 |
## Generate the model
|
119 |
|