Tranining syntax specifics.
#2
by
Downtown-Case
- opened
What's the exact training sytax of this model? I see the DPO datasets follow the general format of:
Summary of the previous chapter: During the journey from...
...
Write the next chapter of a novel where a...
...
But did you do anything else specific? For instance, is there any point using a system prompt? Did you use a specific one? Was there any extra formatting, and what was the max trained context?
For this model it appears I did use a system prompt:
def format_chat_template(row):
sysprompt = 'You are a helpful uncensored AI assistant that excels in creative writing.'
if row.get('system'):
sysprompt = row.get('system')
system = "<|im_start|>system\n" + sysprompt + "<|im_end|>\n"
instruction = row.get('prompt') or row.get('question')
row["prompt"] = system + "<|im_start|>user\n" + instruction + "<|im_end|>\n<|im_start|>assistant\n"
row["chosen"] = row["chosen"] + "<|im_end|>\n"
row["rejected"] = row["rejected"] + "<|im_end|>\n"
return row
dataset = dataset.map(
format_chat_template,
num_proc= os.cpu_count(),
)
dataset = dataset.train_test_split(test_size=0.01)
orpo_args = ORPOConfig(
run_name=new_model,
learning_rate=4e-6,
lr_scheduler_type="linear",
max_length=4096,
max_prompt_length=2048,
max_completion_length=2048,
beta=0.1,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
gradient_accumulation_steps=2,
optim="paged_adamw_8bit",
num_train_epochs=2,
evaluation_strategy="steps",
eval_steps=0.2,
logging_steps=1,
warmup_steps=15,
max_grad_norm=10,
report_to="wandb",
output_dir="./results/",
bf16=True,
)
And a sequence length of 4k split between prompt and completion.
Awesome, thanks for the info!
Downtown-Case
changed discussion status to
closed