Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference

Tranining syntax specifics.

#2
by Downtown-Case - opened

What's the exact training sytax of this model? I see the DPO datasets follow the general format of:

Summary of the previous chapter: During the journey from...
...

Write the next chapter of a novel where a...
...

But did you do anything else specific? For instance, is there any point using a system prompt? Did you use a specific one? Was there any extra formatting, and what was the max trained context?

For this model it appears I did use a system prompt:

def format_chat_template(row):
    sysprompt = 'You are a helpful uncensored AI assistant that excels in creative writing.'
    if row.get('system'):
        sysprompt = row.get('system')
    system = "<|im_start|>system\n" + sysprompt + "<|im_end|>\n"
    
    instruction = row.get('prompt') or row.get('question')
    row["prompt"] = system + "<|im_start|>user\n" + instruction + "<|im_end|>\n<|im_start|>assistant\n"
    row["chosen"] = row["chosen"] + "<|im_end|>\n"
    row["rejected"] = row["rejected"] + "<|im_end|>\n"
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc= os.cpu_count(),
)
dataset = dataset.train_test_split(test_size=0.01)

orpo_args = ORPOConfig(
    run_name=new_model,
    learning_rate=4e-6,
    lr_scheduler_type="linear",
    max_length=4096,
    max_prompt_length=2048,
    max_completion_length=2048,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    optim="paged_adamw_8bit",
    num_train_epochs=2,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=15,
    max_grad_norm=10,
    report_to="wandb",
    output_dir="./results/",
    bf16=True,
)

And a sequence length of 4k split between prompt and completion.

Awesome, thanks for the info!

Downtown-Case changed discussion status to closed

Sign up or log in to comment