anthonysicilia commited on
Commit
a0d1f66
1 Parent(s): c8b6a8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -11,7 +11,33 @@ we expect lower, non-zero temperatures to be best for sampling.
11
 
12
  The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
13
  the estimated probability using the logit for the token " Yes".
14
- In the paper, this model performed best overall . Temperature should be the default value (i.e., 1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).
17
 
 
11
 
12
  The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
13
  the estimated probability using the logit for the token " Yes".
14
+ In the paper, this model performed best overall . Temperature should be the default value (i.e., 1).
15
+
16
+ Here's a comparison of these models with some previous runs of GPT-4 (no fine-tuning). We use data priors and temperature scaling for both models (see paper for details).
17
+
18
+ | model | alg | instances | Brier Score |
19
+ |:----------------------|:-------------|:-------------------|:----------|
20
+ |Llama-3.1-8B-Instruct | DF RL interp | awry | 0.255467 |
21
+ | | | casino | 0.216955 |
22
+ | | | cmv | 0.261726 |
23
+ | | | deals | 0.174899 |
24
+ | | | deleted | 0.255129 |
25
+ | | | donations | 0.251880 |
26
+ | | | supreme | 0.231955 |
27
+ |Llama-3.1-8B-Instruct | IF SFT | awry | 0.220083 |
28
+ | | | casino | 0.196558 |
29
+ | | | cmv | 0.207542 |
30
+ | | | deals | 0.118853 |
31
+ | | | deleted | 0.114553 |
32
+ | | | donations | 0.238121 |
33
+ | | | supreme | 0.223060 |
34
+ |OpenAI GPT 4 | None | awry | 0.247775 |
35
+ | | | casino | 0.204828 |
36
+ | | | cmv | 0.230229 |
37
+ | | | deals | 0.132760 |
38
+ | | | deleted | 0.169750 |
39
+ | | | donations | 0.262453 |
40
+ | | | supreme | 0.230321 |
41
 
42
  Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).
43