anthonysicilia/Llama-3.1-8B-FortUneDial-DirectForecaster

This is a Llama 3.1 fine tune using the RL algorithm and benchmark data proposed in the paper "Deal or no deal (or who knows)" published in ACL Findings 2024. Models from this paper are designed to predict the outcome of an unfolding conversation, specifically noting the probability that the outcome will occur. For instance, these models can estimate the probability that a deal will occur before the end of a negotiation.

The "Direct Forecaster" (the model in this repo) is trained with RL to output the probability in it's sampled tokens. In the paper, this model seemed to handle out-of-distribution data the best. Based off experiments, we expect lower, non-zero temperatures to be best for sampling.

The "Implicit Forecaster" (available here) is trained with SFT to output the estimated probability using the logit for the token " Yes". In the paper, this model performed best overall . Temperature should be the default value (i.e., 1).

Here's a comparison of these models with some previous runs of GPT-4 (no fine-tuning). We use data priors and temperature scaling for both models (see paper for details).

model	alg	instances	Brier Score
Llama-3.1-8B-Instruct	DF RL interp	awry	0.255467
		casino	0.216955
		cmv	0.261726
		deals	0.174899
		deleted	0.255129
		donations	0.251880
		supreme	0.231955
Llama-3.1-8B-Instruct	IF SFT	awry	0.220083
		casino	0.196558
		cmv	0.207542
		deals	0.118853
		deleted	0.114553
		donations	0.238121
		supreme	0.223060
OpenAI GPT 4	None	awry	0.247775
		casino	0.204828
		cmv	0.230229
		deals	0.132760
		deleted	0.169750
		donations	0.262453
		supreme	0.230321

Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).

The GitHub repo. (here) is also available if you wish to train new models with similiar training algorithms. This repo. also contains plenty of examples of how to use these models for inference and load them from a local directory.

For any questions, please reach feel free to reach out!

Some quantization details are given below:

library_name: peft

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: QuantizationMethod.BITS_AND_BYTES
_load_in_8bit: False
_load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float16
bnb_4bit_quant_storage: uint8
load_in_4bit: True
load_in_8bit: False

The following bitsandbytes quantization config was used during training:

quant_method: QuantizationMethod.BITS_AND_BYTES
_load_in_8bit: False
_load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float16
bnb_4bit_quant_storage: uint8
load_in_4bit: True
load_in_8bit: False

Framework versions

PEFT 0.5.0
PEFT 0.5.0