anthonysicilia
/

Llama-3.1-8B-FortUneDial-DirectForecaster

Model card Files Files and versions Community

anthonysicilia commited on Aug 12

Commit

a0d1f66

•

1 Parent(s): c8b6a8d

Update README.md

Files changed (1) hide show

README.md +27 -1

README.md CHANGED Viewed

@@ -11,7 +11,33 @@ we expect lower, non-zero temperatures to be best for sampling.
 The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
 the estimated probability using the logit for the token " Yes".
-In the paper, this model performed best overall . Temperature should be the default value (i.e., 1)
 Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).

 The **"Implicit Forecaster"** ([available here](https://huggingface.co/anthonysicilia/Llama-3.1-8B-FortUneDial-ImplicitForecaster)) is trained with SFT to output
 the estimated probability using the logit for the token " Yes".
+In the paper, this model performed best overall . Temperature should be the default value (i.e., 1).
+Here's a comparison of these models with some previous runs of GPT-4 (no fine-tuning). We use data priors and temperature scaling for both models (see paper for details).
+| model                 | alg          | instances          | Brier Score        |
+|:----------------------|:-------------|:-------------------|:----------|
+|Llama-3.1-8B-Instruct  | DF RL interp | awry               | 0.255467  |
+|                       |              | casino             | 0.216955  |
+|                       |              | cmv                | 0.261726  |
+|                       |              | deals              | 0.174899  |
+|                       |              | deleted            | 0.255129  |
+|                       |              | donations          | 0.251880  |
+|                       |              | supreme            | 0.231955  |
+|Llama-3.1-8B-Instruct  | IF SFT       | awry               | 0.220083  |
+|                       |              | casino             | 0.196558  |
+|                       |              | cmv                | 0.207542  |
+|                       |              | deals              | 0.118853  |
+|                       |              | deleted            | 0.114553  |
+|                       |              | donations          | 0.238121  |
+|                       |              | supreme            | 0.223060  |
+|OpenAI GPT 4           | None         | awry               | 0.247775  |
+|                       |              | casino             | 0.204828  |
+|                       |              | cmv                | 0.230229  |
+|                       |              | deals              | 0.132760  |
+|                       |              | deleted            | 0.169750  |
+|                       |              | donations          | 0.262453  |
+|                       |              | supreme            | 0.230321  |
 Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).