anthonysicilia
commited on
Commit
•
a0d1f66
1
Parent(s):
c8b6a8d
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,33 @@ we expect lower, non-zero temperatures to be best for sampling.
|
|
11 |
|
12 |
The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
|
13 |
the estimated probability using the logit for the token " Yes".
|
14 |
-
In the paper, this model performed best overall . Temperature should be the default value (i.e., 1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).
|
17 |
|
|
|
11 |
|
12 |
The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
|
13 |
the estimated probability using the logit for the token " Yes".
|
14 |
+
In the paper, this model performed best overall . Temperature should be the default value (i.e., 1).
|
15 |
+
|
16 |
+
Here's a comparison of these models with some previous runs of GPT-4 (no fine-tuning). We use data priors and temperature scaling for both models (see paper for details).
|
17 |
+
|
18 |
+
| model | alg | instances | Brier Score |
|
19 |
+
|:----------------------|:-------------|:-------------------|:----------|
|
20 |
+
|Llama-3.1-8B-Instruct | DF RL interp | awry | 0.255467 |
|
21 |
+
| | | casino | 0.216955 |
|
22 |
+
| | | cmv | 0.261726 |
|
23 |
+
| | | deals | 0.174899 |
|
24 |
+
| | | deleted | 0.255129 |
|
25 |
+
| | | donations | 0.251880 |
|
26 |
+
| | | supreme | 0.231955 |
|
27 |
+
|Llama-3.1-8B-Instruct | IF SFT | awry | 0.220083 |
|
28 |
+
| | | casino | 0.196558 |
|
29 |
+
| | | cmv | 0.207542 |
|
30 |
+
| | | deals | 0.118853 |
|
31 |
+
| | | deleted | 0.114553 |
|
32 |
+
| | | donations | 0.238121 |
|
33 |
+
| | | supreme | 0.223060 |
|
34 |
+
|OpenAI GPT 4 | None | awry | 0.247775 |
|
35 |
+
| | | casino | 0.204828 |
|
36 |
+
| | | cmv | 0.230229 |
|
37 |
+
| | | deals | 0.132760 |
|
38 |
+
| | | deleted | 0.169750 |
|
39 |
+
| | | donations | 0.262453 |
|
40 |
+
| | | supreme | 0.230321 |
|
41 |
|
42 |
Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).
|
43 |
|